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In this Issue: 

In the world of computers, faster is generally better, everything else being equal, 
A computer that executes more instructions per second can process more data in a given 
amount of time, serve more users at the same time, and respond more quickly to changes 
in its input data. Again, everything else being equal, a computer that executes more instruc- 
■■rfs§ tions per second is a more powerful machine and therefore a larger" computer, although it 
^j ™ ~ ^53 may be physically smaller than some slower and less capable machine, and may cost less, 
^• — — "i>^* too. since over the years advancing technology has steadily given us larger and larger 
• ^ » r ^ computers for the same amount of money. 

The subject of this month's issue is Hewlett-Packard's largest computer, the HP 3000 Series 64. Large 
enough to handle the entire data processing needs of a good-sized company, this newest member of HP's 
business computer family has 2Vz times the processing power of the HP 3000 Series 44, previously HP s 
largest, and nearly ten times the power of the HP 3000 Series 30, the smallest member of the family. For 
example, the Series 64 can have 144 terminals attached to it, while the Series 44 can accept only 64. 
The main reasons for the Series 64 s greatiy improved performance are two: faster operation and parallel 
operation (doing more than one thing at a time.) Faster operation comes from the use of emitter coupled logic 
(ECL), one of the fastest commercially available integrated circuit Jogic families, and from some advanced 
memory techniques Parallel operation is made possible by a pair of arithmetic logic units, or ALUs, that 
share the calculating and decision making that are the basic functions of a computer. With its combination of high 
speed and parallel operation, the Series 64 can execute well over a million instructions per second (MIPS, in 
computer jargon), a very creditable number and a real bargain at the Series 64" s price, although far from world- 
championship performance. (There are several computers today in the 10-15 MIPS class and a few that 
approach 50 MIPS,) 

Since it is an HP 3000, the Series 64 can run programs written for other HP 3000s, Because it is a highly 
complex machine, its designers went to great lengths to make it reliable and easy to service, and it qualifies 
for HP's money-back guarantee that it will be operational at least 99% of the time. Pictured on this 
months cover is the Series 64 system processing unit superimposed on a photograph of a printed circuit board 

loaded with ECL circuits. The board carries part of the ALU section of the computer. 

-ft P. Dofan 

Editor R,c r Mr:;i f Dofap • Associate Ed*? ■: • ■-.:■' Daniefc i • sttal ■ ' - ■.■ yapcfertaJoom 

Admmisirahve Setvces Typogj - Pres«i Susan E Wngnr s European Prodir: ' ■ mi Lamne*ep 

2 HEWLETT-PACKARD JOURNAL MARCH 19S2 c H^tofl-Pttltard Company 1982 Printed In U.£A 

©Copr. 1949-1998 Hewlett-Packard Co. 




High-Performance Computing with 
Dual ALU Architecture and ECL 
Logic 

This largest and fastest HP 3000 Computer System can 
handle all of the data processing needs of many companies, 

by Frederic C. Amerson T Mark S, Linsky, and Elio A. Toschi 



| EWLETT-PACkARITS HP 3000 COMPUTER System 
^^ family offers users a choice of compatible interac- 
' tive business systems of various sizes, prices, and 
performance levels, all using HP's MPE (Multiprogram- 
ming Executive) operating system. The new highest-per- 
formance member of this family is the HP 3000 Series 64, 
Fig, 1. The Series 64 provides over 2,5 times the processing 
power of the Series 44, HP's previous performance leader. 
This new high-end machine now extends the performance 
of the HP 3000 Computer family into the one million 
instructions/second class. Although basically a 16-bit-word 
machine like other HP 3000s, it is capable of emulating a 
32-hit-w T ord machine in many applications. 

The Series 64 is expected to be used in all types of EDP 
(electronic data processing) and distributed data process- 
ing applications, As a stand-alone computer system, it can 
perform a wide range of tasks for a division of a large com- 
pany find can handle all the EDP needs of a medium-size 
company. In a distributed processing environment, the 
Series 64 can serve either as a major node or as the cent nil 
computer in distributed networks. 



The increased system performance of the Series 64 comes 
primarily from faster operation and performing more calcu- 
lations in parallel. The methods for achieving these results 
are the use of dual ALUs (arithmetic logic units], WCS 
(writable control store), cache memory, 32-bit memory ad- 
dresses and ECL (emitter coupled logic) technology. Fig, 2 
is a block diagram of the Series 64 CPU (central processing 
unit). 

A dual ALU design is effective only if parallel operations 
can replace operations that were sequential and therefore 
took longer to complete. The rich instruction set of the HP 
3000 lends itself to more parallelism than simpler instruc- 
tion sets. The traditional method of achieving parallelism is 
pipelining. The design of the Series 64 CPU incorporates 
this traditional approach into the dual ALU design to 
achieve nearly twice the efficiency of a single ALU, Pipelin- 
i tm tsa design organization that moves data to be processed 
through a sequence of hardware operations, A piece of data 
enters this pipe each clock cycle and proceeds through each 
stage or operation on successive clock cycles until it is 
completely processed. 




— & 






/ 



Fig, 1. The HP 3000 Series 64 
Computer System can handle all 
the data processing needs of a 
medtum-sizB company or a dt vi- 
sion of a large company In net- 
works it can serve as a major node 
or as the central computer its per- 
formance is in the one mtilton 
instructions'second class 
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To Cache 
Memory 



32-Bit Address Bus 



From Cache 
Memory 



Fig. 2. The HP 3000 Series §4 

central processing unit achieves 
high performance by means of 
dual arithmetic togic units, a cache 
memory, a three- rank pipelined 
data path, and htgh-speed emitter 
coupfed logic. 



Writable control store provides faster operation than 
read-only memory (ROM) for storing and reading the mi- 
crocode that implements the instruction sets of most mod- 
ern computers. The static RAMs (random-access memories] 
used to design the VVCS are much faster than ROMs of 
similar and compatible technologies. 

For increased memory bandwidth, both cache memory 
and a wide memory bus are used. A cache, or small buffer 
memory, provides high-speed local memory for the CPU 
while interfacing to a larger but slower main store. The goal 
is to design the cache such that, most of the time, requests 
from the CPU for data or instructions will not require the 
cache to go to main memory to get the particular address 
needed. This depends very much on the software operating 
on the system and on the hardware organization of the 
cache- For the Series 64 running its expected job mix, the 
CPU cache, with 8K bytes of data storage, provides the CPU 
with about 95% of its requests in one clock cycle, The rest of 
the time accesses take longer because the information must 
come from main memory. However, the average memory 
access time is still less than two clock cycles. 

The interface to the memory system for the cache and I/O 
system is a 32-bit-wide bus that operates at transfer rates as 
high as ISM by tea- 1 second (limited by memory speed). The 
16- bit LO buses of the Series 64 are interfaced to this high- 
speed central bus by input/output adapters, or IQAs* each of 
which has a 64-byte buffer. These buffers allow these I O 
ports to match the slower 16-bit I/O buses to the higher- 
speed, 32-bit central system bus. 

One of the highest-performance logic families in wide 
use is emitter coupled logic (ECL). In the Series 64, both 10k 
and 100k ECL techniques were selected for those design 
areas where the speed and functions available are well 
matched to the requirements of the design. STTL (Schottky 
transistor-transistor logic], another fast logic family, is used 
for the memory array and I/O interfaces. For STTL, 10k ECL. 
and 100k ECL, typical delays are 3, 2, and 1 nanoseconds 
per gate, respectively, and each technology offers functions 
not available in the others, 



Design Objectives 

Although the main objective of the Series 64 was to ex- 
tend HP 3000 pric&'performance to a new high level, there 
were several other important design objectives. These in- 
cluded software compatibility with existing MPE-based 
systems, I/O system compatibility with HP 3000 HP-IB* 
peripherals, and availability of guaranteed uptime service 
(GUS). 

The Series 64 is software compatible with previous HP 
3000 family members. MPE, the Multiprogramming Execu- 
tive, is the operating system for the HP 3000 product line, 
Object code compatibility is preserved such that any appli- 
cation program written on other HP 3000s in any of six 
languages— COBOL, FORTRAN, Pascal RPG. BASIC, and 
SPL — will run on the Series 64. 

The I/O system is the same as that found in the Series 44 
arid other HP 3000s using the HP-IB, As in these other 
systems, the HP- .IB communicates with the CPU and mem- 
ory through a channel controller board, which is installed 
on the intermodule bus (1MB). In the Series 64 , however, 
the memory and CPU are not on the 1MB. Communication is 
accomplished through an I/O adapter [IOAJ which inter- 
faces the 1MB to the central system bus (CSB). Multiple 
IQAs can be supported on the CSB (currently a maximum of 
two), significantly increasing the I/O bandwidth. This I/O 
system design provides customers with an upgrade path 
for their HP-IB peripherals. HP-IB device controllers, and 
channel controllers, 

For GUS to be available on the Series 64, high reliability 
and supportability were important design goals. Guaran- 
teed uptime service is Hewlett-Packard's money-back 
guarantee that the CPU. cache, main memory, and 1.0 sys- 
tem including one or two system discs are operational at 
least 99% of the time. To be able to offer GUS. it is impera- 
tive that the system have a high level of reliability, or a high 
mean time between failures (MTBF), and a high level of 
supportability, or a low mean time to repair (MTTR). 

. oaga 7) 
"HP-IB -a Hewlett-Packard's imple^enTaT on Ql iEEE Standard 4flB-i97B 
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Dual ALU Micromachine Has 
Powerful Development Tools 

by Richard D. Murillo 



The central processing unit (CPU) of the HP 3000 Series 64 is 
microprogrammed to interpret tne dual ALU architecture of ihis 
machine A single line of microcode controls the execution of two 
- process no units. Each unit controls its own execution on 
subsequent clocks and may specify The next line of microcode to 
be executed The two processing units are referred to as ALUA 
and ALUS Each processor consists of an arithmetic logic unit 
(ALU) and shifter two source operand data paths, a target data 
path, logic to perform special functions such as reading and 
writing memory, and logic to control the sequence ot microcode 
execution 

The Series 64 microcode processor uses a three- stage pipeline 
to execute microcode The three stages are referred to as Ranki , 
RANK2. avti RANK3. During RANKt, source operands for the two 
ALUs are clocked mio the registers that serve as input To the 
ALUs. Durrng rank2, the arithmetic operations are performed and 
the results are stored into the destination register Special opera- 
tions and skip tests are also performed during RANK2. Memory 
reference operations initiated during RANKS are performed during 
RANKS, 

Three Speeds of jumps are built into the Series 64 processor. A 
fast jump (unconditional jump) is taken from RANKi . The target of a 
fast jump is the next line of microcode to enter the pipeline after 
the jump line Medium-speed jumps are dependent on rnforma- 
tion known at the beginning of RANK2, and are taken during 
RANK2. In this case, the next sequential line enters the pipeline 
before the target of the jump The execution of this line is inhibited 
when it enters rank?, Slow jumps depend on the output informa- 
tion of the ALUs during RANK3 In trus case, the next two sequen- 
tial lines of microcode entering the pipeline are inhibited during 
RANK2. inhibition of the RANK2 execution occurs when the nop 
flip-flop is set for the respective ALU Accordingly, the term 
NOPed means inhibition of RANK2 execution. This inhibition has an 
effect only on the storing of information from the ALU and the 
special/skip functions. The ALU operations are still performed 
and the output of the ALU may be used on subsequent lines. 

This prpeJinmg of the microcode processor contributes sig- 
nificantly to the performance of the Series 64 CPU The micro- 
programmer, however, must be aware of the effects of the 
e on the execution of microcode For example, new data 
stored into some registers cannot be accessed on the following 
fine of microcode The store into the register on the first line takes 
place m RANK2. and the read on the second line takes place m 
RANKt, both during the same clock period. Therefore, the new 
data is not clocked into the register until after it is read. 

The microcode format field is divided into fourteen fields across 
two ALUs on a sjngle line of microcode. These fields include two 
input sources, operation functions and store target, special CPU 
functions, and microcode skip conditions. A subfunction field may 
be used to specify shifting of the ALUs or a target of a microcode 
jump. One of The input source fields on either ALU can be used to 
specify a short or long hexadecimal literal, Because the mi- 
crocode is Huffman-encoded, use of the subfunction or literal 
options will disallow the use of some fields during the assembly of 
the microcode line. For example, specifying a long (16-bit) literal 
will disallow the use of special and skip fields and one of the input 
source fields. 
Development System 

The HP 3000 Series 64 microcode development system {MDS} 



is an interactive software tool used for the overall design and 
eve t system and diagnostic microcode The develop- 
ment system includes a microcode assembler and editor 

artiware it- alor, and a set of symbolic debugging 

tools. 

The microcode development system runs as an interactive 
program under the HP 30O0 Multiprogramming Executive (MPE) 
operating system ft has a complete set of commands for entering 
and saving microcode source files, editing microcode fields and 
source tines, assembling microcode, and running a system simu- 
lation (see Fig. 1) It also has a set ot debugging tools and 
commands. These commands consist of one or more letters and 
can be uppercase or lowercase. Multiple commands on one line 
may be entered by separating them with semicolons. Many com- 
mands have parameters that may be entered as general ex- 
pressions. The system also provides a method of terminating 
command execution 

The microcode development system provides a w>de range of 
commands for working with mtcrocode source fifes. These files 
are compatible with HP 3000 EDITOR source files, The system 
provides the capability of entering microcode source files into a 
work file, keeping me text in a permanent MPE file, and renumber- 
ing the microcode text. Editing features of the system include 
adding and deleting of text, modifying, listing, and insertion of 
microcode lines, and the addition of comments into the text. 

Using the BATCH command allows the microprog rammer to 
assemble source microcode This assembly process takes a 
microcode source file as input and produces a microcode format- 
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Fig. 1. Development sequence using MDS, the HP 3000 
Series 64 microcode development system. 
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led ^nary file, The microcode assembler also produces a listing 
of the microcode lext, a cross-reference map (labels mapped to 
addresses), a listing of writable control store (WCS) which is 
the layout of the microcode memory area in binary, and a list- 
ing of the lookup tables (LUT), which is a layout of the system 
macroinstruction-to-microcode-mstruction mapping The as- 
sembler uses imbedded commands «n the source file as options 
to control these listings There are also options to print top-ot- 
page headings and the setting of the control store address. 

The LUT command allows the microprog rammer to set up the 
lookup iable entry for a particular system macro Instruction It 
allows the programmer 10 specify the binary format of tne mic- 
roinstruction, the preadjustment value of the too-of-stack regis- 
ters, the displacement value, and indexing and indirect options 
for memory reference microinstructions 

Simulation Package 

The microcode development system has a simulation package 
that allows *he microprog rammer to simulate the execution of a 
microprogram so that it can be checked out before The program is 
placed into the system control store. Program execution can be 
simulated a single line at a time or as a free- run execution 

Execution of a microprogram rs accomplished by entering the 
microcode into a development system work file and using the 
EXECUTE command The execution is divided into three Types: 
single-line (wait) manual control execution, single-line (pause) 
automatic control execution, and free-run execution of the mi- 
croprogram. 

Singte-step execution of a microprogram allows the micropro- 
grammer to step through each line of microcode Through the CPU 
pipeline The programmer can examine each step on the display 
screen of a CRT terminal (Fig. 2). The screen gives the program* 
mer a visual picture of the CPU hardware buses and registers 
after each hne of microcode is executed The programmer can 
also examine the effects of each microcode line as it passes 
through the ranks in the pipeline in single-step mode, the screen 
is updated after each microcode line is executed either manually 
(programmer hits the carriage return key on the terminal) or au- 
tomatically (system updates after one second), 

The microprog rammer can use the EXECUTE command to 
specify The control store address where microcode execution will 
start, and can specify the number of simulated clock cycles To 
excute before terminating the simulation. The programmer can 
also execute the next overhead ( mac rom struct ion fetch.' decode) 
microcode line for software simulation and print a trace of the 
simulation to a hard- copy device. 

The simulator contains a memory image area for loading and 
executing software programs and data. This environment area 
allows the microprog rammer to simulate the execution o* diagnos- 
tic programs or special instruction test routines Using the 
STORE RESTORE commands, a microprog ram mer can store an 
environment as an MPE file or restore an environment into the 
simulator This debugging tool allows the microprog rammer to 



I Simulated 
f Microcode 

} RANK? 
Data Path 
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Fig. 2. The programmer can step 
through a microcode program one 
><ne at a time and observe the re- 
suits on a CRT terminal. The screen 
is divided into two main areas, reg- 
ister display and si mutated mr- 
crocode display. Register con- 
tents are shown in hexadecimal 
notation and inverse video is used 
to show different flag states. 



correct problems in microcode and restore test routines without 
having to rebuild the environment 

Debugging Tools 

The microprog rammer is also supplied with a complete set of 
commands for use in debugging the microcode. In addition to 
single-step execution and environment test routines, the pro- 
grammer can set and clear microcode breakpoints m control 
store, display and modify memory locations at specifieo address- 
es, and evaluate arithmetic expressions in three different Pases 
octal, decimal and hexadecimal The microprogrammer can set 
up to 32 d itfe rent break point add resses for t rac mg the path of tne 
microcode and can indicate a count for rhe number of times tne 
breakpoint is executed before the break Is taken, Memory can oe 
displayed at the Terminal or printed out on a hard-copy device 
The system also contains an expression evaluator which allows 
The microprogrammer to examine and/or evaluate registers, con- 
stants, and symbolic microcode labels values m an expression. 

The microcode development system was used m The overall 
development of the Series 64 CPU design and for development 
and codrng of the HP 3000 machine instruction set, system mi- 
crocode diagnostics, and channel program microcode. This 
powerful development tool was instrumental in the discovering of 
hardware/firmware design problems in the early stages of the 
project and The debugging of the system microcode white the 
hardware was being built. Tne system was also used for simulat- 
ing microcode test programs, for CPU software diagnostic 
routines, and for determining hardware design faults. It is hoped 
that the microcode development system can be enhanced or 
modtfied for future system designs and machine architectures. 
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Design for Reliability 

increased reliability is the result of an improved inter- 
connect scheme, a highly effective cooling system, system 
design to protect against el ectro magnetic susceptibility, 
and careful component selection and qualification. Careful 
consideration of each of these areas was necessary to pro- 
vide an optimal system design. 

The large number of interconnections in the CPU 
cardeage demanded a dense, reliable* and cost-effective 
connector scheme. With a maximum of BM bytes of memory 
in the system, there are over 8000 high-speed backplane 
connections and over 1000 for the frontplane. For added 
reliability, two-piece connectors were chosen over printed 
circuit board edge connectors for the backplane and 
frontplane, as well as for all front plane connectors for ca- 
bles. This t:hoice also means that gold-plated printed circuit 
boards a re not required, which results in a considerable cost 
savings. 

The cooling design is very critical to the reliability of the 
system. The results are impressive. Even though the power 
dissipated in the CPU cardeage is three to eight times that 
found mother HP 3000 Computer Systems, the temperature 
rise for a system at room temperature is on the average, 50% 
lower than for these same systems. This reduction in tem- 
perature rise inside the system leads to a much lower junc- 
tion temperature for the components, which of course 
means higher reliability, 

System reliability with respect to electromagnetic sus- 
ceptibility requires that the system be immune lo certain 
levels of electrostatic discharge (ESD). radiated and con- 
ducted electrical waves, power line transient noise, and 
magnetic waves. The mainframe and power system design 
must be done at the system level to guarantee that the 
system is insensitive to these types of externally generated 
interference. Grounding ami .shielding of the Series (i4 pro- 
cessing unit minimizes tin? effects of static discharge. Im- 
munity to power line transients is another important d> 
objective, and the design of the Series 64 ac power system 
makes it resistant to injected transients of up to lUOuV. 

The selection and qualification of quality components 
plays an integral part in the design of a reliable system. All 
semiconductor devices used in the Series 64 are covered by 
Hewlett-Packard's general semiconductor specification, 
which defines those standards and requirements thai must 
be met by suppliers and devices to meet mini mum quality 
and reliability levels. Since ECL and 64 K dynamic RAMs 
were critical to the success of the Sei les &4< extensive test- 
ing was done to evaluate these components thoroughly. 

Three main ECL families were tested; 10k ECL, lOnk ECL, 
and ink ECL RAMs. These groups were further subdivided 
into the processes used to fabricate the devices. A represen- 
tative part was chosen based on complexity and use from 
each family and process for qualification. The qualification 
tests consisted of dynamic operating life at elevated tern- 
peratees with parametic measurements recorded at inter- 
mediate points from start to luUO hours. This type of testing 
is concerned with stability and allows for trend analysis, 
Hewlett-Packard was then able to communicate to the ven- 
dors any problems encountered with enough data so that 
propei iit tiun could he taken, 

HP has developed much experience as an aggressive user 



of semiconductor memories- Through the cooperation of 
the using divisions, a corporate specification was de- 
veloped. It was also possible to establish efficient and effec- 
tive evaluation and qualification procedures for64K RAMs. 
Aggressive soft error rates were specified and numerous 
RAMs tested to determine the rates at the system level and 
for accelerated tests. A key goal is a rate of <1000 fits 
i failures per 10* hr) in system operation. The results were 
the early establishment of qualified suppliers and devices 
for Hewlett-Packard and the Series 64. 

Assuring Supportability 

The other aspect of system availability is supportability: 
the length of time it takes to isolate a failure and fix it. 
Serviceability in the Series 64 is enhanced by a sophisti- 
cated diagnostic control processor and comprehensive 
fault-locating diagnostics, The diagnostic control unit or 
DCU is a separate processor that has the capability to access 
CPU registers, monitor line voltages and system tempera- 
tures, conduct self- tests, and log errors on the system. With 
ex tensive micro diagnostics, it can isolate hard ware failures 
down to the functional board level. Since microcode is 
stored in RAM and not in ROM, there is no real limitation on 
the amount of storage available for microdiagnostics. It was 
therefore possible to develop tests that perform a hierarchi- 
cal diagnosis of the hardware, allowing quicker and more 
accurate isolation of faults. As in other HP :iO0G Computer 
Systems, remote diagnosis is also provided, 

Dual ALU Design 

Other I IP 30130s have used more than one arithmetic logic 
unit [ALU] to perform the arithmetic functions necessary 
for the execution of instructions. I lo wever. these additional 
ALUs have been very special- purpose, allowing tittle flexi- 
bility. In particular, one A3, 1 always added together the 
index register and the displacement field of I he instruction. 
W hen the instruction was not indexed, it added zero to the 
displacement. This logic then provided the sum to the mi- 
croprogram mer whenever it was needed. Usually, the sum 
was needed only once during an instruction, so rnost of the 
time this ALU was performing needless work, In the ap- 
proach taken by the Series 64, this second ALU is general- 
purpose, so that it is useful throughout the instruction. 
Thus the microprogrammer is allowed direct control of the 
function of this ALU. thereby greatly increasing its effi- 
ciency 

Another approach lo improving performance is adding 
processors to a system. Unfortunately, the second processor 
of a multiprocessor system usually requires considerable 
sophisticated software to achieve the hoped-for perfor- 
mance. Many complicated interactions between processors 
can occur that are not problems in a uniprocessor system. 
Also, some functions do not behave as they do in a uni- 
processor. For example, what does the second processor do 
when interrupts are disabled by the first processor? This 
must he dealt with in I he software design. For another 
example, the function ol disabling process switching no 
longer prohibits certain actions, as it does in a uniprocessor 
system, since there still may be more than one process 
running. Thus other more expensive mechanisms must be 
devised to control the interaction between software mod- 
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ules which might be running on different processors. In 
general, this results in the second and successive proces- 
sors of a multiprocessor system being much less than 100% 
effective, 

By putting the second processor under microcode 
(firmware) control rather than software control, perfor- 
mance is achieved without complicated multiprocessor 
software. System performance benefits as if there were more 
than one processor, which is true, but the software does 
not see the added complexity; it is hidden in the microcode. 
The system benefits from the additional hardware without 
suffering the performance degradation of multiprocessor 
software. 

Parallelism in microcode means that each instruction 
runs faster than in a system with parallelism in software. 
Thus instruction execution time is reduced, A stream of 
single instructions benefits from tbe additional hardware as 
much as multiple timeshare jobs. In a multiprocessor sys- 
tem, this performance benefit is not realized until there is 
enough work to be done in parallel to keep all of the proces- 
sors busy. The richness of the HP 3U00 instruction set pro- 
vides the basis for a contribution in parallelism, tn a simpler 
instruction set, there may not be enough work to keep two 
parallel ALUs busy. The general bounds checking of the HP 
3000 and powerful instructions like procedure calls benefit 
from the power of two ALUs. In fact, because of some other 
minor improvements, a procedure call on the Series 64 
takes fewer than half the cycles of previous HP 3000 sys- 
tems. 

A wide microcode word makes options available to the 
microprogrammer that were not available on earlier sys- 
tems. In the past, only certain registers could be used to 
store addresses that were also sent to the memory system, 
That has been expanded to allow any register to be used to 
save the address being sent to memory. 

One of the interesting anomalies of ECL logic is the rela- 
tionship between read-only memory parts [ROMs] and 
random-access memories (RAMs), In past HP 3000 systems, 
the microprogram has been stored in ROM because this was 
more cost-effective. However, for ECL, it is more cost- 
effective to use RAM, or writable control store (VVCS). The 
flexibility obtained makes it easy to update the microcode 
when bugs are found or performance enhancements made. 
The capability of loading microdi agnostics and executing 
them at the customer site in the WCS gives the Series 64 the 
best fault-locating diagnostics available on any of 
Hewlett-Packard's systems today. 

The design approach is dubbed dual ALU even though 
much more than the ALUs are paired. When the decision 
w r as made to make the second ALU general-purpose, it was 
necessary to provide support for it so that it would be as 
useful as the first. Original estimates for the efficiency of 
this second ALU were in the vicinity' of seventy percent, but 
with the right support, the actual efficiency has been well 
over eighty-five percent. 

ALU Capabilities 

Each of the tw f o ALUs has access to a subset of the regis- 
ters available in the processor. Some are available as one 
source to each of the ALUs while others may be a source 
operand to either input of both ALUs. Most registers can be 



altered by one or the other ALl T . but not both. The exception 
to this is the top-of-stack cache registers (TOS) which are 
available to both ALUs as either operand and may be altered 
by both ALUs. This makes these registers extremely valu- 
able as scratchpad registers because they are useful for 
passing data between the ALUs, Unfortunately their gener- 
ality also makes them expensive to implement, so there are 
only eight of them. The output of either ALU may be passed 
to its own input or that of the other ALU; this is also a 
frequent method of exchanging data between the ALUs. 
Scratchpads and software environment registers may be 
altered by one of the ALUs and read by one or both of them. 
In the case of the environment registers, this is particularly 
useful since it allow r s both the upper and lower limits of an 
address to be checked in a single microcycle. Each ALU has 
a bank of 512 registers that are available only to it. By 
pairing these registers it is possible to read, store, and oper- 
ate on thirty-two bit data, even though the ALUs themselves 
are only sixteen-bit units, 

Either ALU may specify any of a variety of special op- 
tions. These perform such useful functions as setting condi- 
tion codes on data, checking for operands that may be in the 
TOS registers, setting and clearing flags, and controlling 
the loop counter, Memory reads and wriles are also initiated 
in these fields, 

Each ALU may independently skip the execution of its 
hall of the following line of microcode by use of its skip 
field. This very powerful technique keeps both of the ALUs 
busy executing useful code as much of the time as possible, 
since it is necessary to skip the effect of only half the line of 
microcode rather than the entire line. Also, either ALU may 
specify transfer of control to another point in the microcode 
by a conditional or unconditional jump instruction, 

Tight control of the two ALLIs is maintained by keeping 
them in lock step with a single microaddress register, When 
either ALU specifies a jump, both ALUs must jump to the 
new location. Thus, synchronization problems that would 
occur if each could be executing independently are elimi- 
nated. Although the control of the microsequencer is more 
complex than it might be for a single ALU, it is far less 
complex than the mechanism that would be required to 
keep two independent streams of instructions from causing 
total chaos, it is impossible to get the ALUs out of syn- 
chronization since they are both controlled by the same 
microcode word, The control store is addressed by a single 
register. It may be modified by either ALf J, but there is only 
one register. By looking at only one line of microcode it is 
possible to determine precisely what each ALU will do. 

Since it is possible for both ALUs to specify a jump on the 
same line of microcode, a priority mechanism is necessary 
to resolve conflicts. One approach would require that the 
microcode never specify two jumps that could both be taken 
on a single line. This would give a straightforward method 
of determining the next line of microcode, but would re- 
strict the programmer and limit flexibility, A more power- 
ful approach assigns a priority mechanism if both ALUs 
specify a jump at the same time. If neither of the jump 
conditions is met, execution continues in sequence, if only 
one of the conditions is met, execution transfers to the 
location specified by that ALL 1 . But, if both conditions are 
met* then one of the ALLIs has priority over the other. This 
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structure makes it possible to perform a multiway branch in 
a single iine of microcode. By coupling with the skip condi- 
tions of the preceding line, a three-way branch can be per- 
formed on four different conditions, An insightful coder 
use this for tremendous leverage. 

Simultaneous access to the cache memory is perceived by 
the ALUs although the cache interface has only a single 
port. It is possible for either or both ALUs to spec 
memory read or write on any line of microcode. The cache 
memory, however, will accept only a single request from 
the processor in any cycle. To reconcile the CPU to the 
cache, a strict priority mechanism sorts out all accesses and 
sends them to the cache in priority order. Thus, if botli 
ALUs request a memory access at the same lime- the lower- 
priority one will have its request queued until the higher- 
priority one completes. In general, no more than one re- 
quest to memory is needed for each cycle in the processor, 
but it is not always convenient to Specify the requests on 
separate cycles. The priority scheme, like that of the jumps, 
allows greater flexibility to provide increased performance, 

Accesses to the cache that cannot be satisfied im- 
mediately are deferred in hardware so that until the data is 
actually needed the microprocessor does not stop and wait. 
There is no speed advantage to allowing both ALUs to 
Specify a memo ry access on the same line if processing halts 
until both requests can be sent to the cache. It is therefore 
necessary to buffer the requests in a holding register and 
allow the processor to continue execution. Only when more 
requests are made than can be buffered must processing be 
suspended to wail for completion. Up to three requests can 
be in process without overflowing the buffer. 

The two sixteen-bit ALUs can be linked together to per- 
form 32-bit calculations. In cases where it is useful to per- 
form more than sixteen-bit calculations, a special option in 
the microcode, called LJNK T ties both nt llir AH s together 
to perform 32~brl arithmetic- There are many instructions in 
the HI 1 :uiuu architecture that benefit from this capability. It 
is particularly useful in some of the multiply and divide 
instrui Hobs which are simplified by not having to pi»*< ■• 
together so many sixteen-bit partial results. 

ECL Logic Design Considerations 

As mentioned earlier, the obvious reason for using emit- 
ter coupled logic (ECL) is its high speed, but other ad van- 
tages come from ECL's low-voltage logic swings. These 
advantages are low noise induced into power supplies, low 
radiated noise, and low crosstalk. However, these small 
logic voltage swings also result in low noise margins. Noise 
margin is reduced by tempera lure differentials between 
drivers and receivers, voltage differences caused by power 
distribution, voltage noise on power buses and power 
planes, and signal voltage ringing because of impedance 
mismatch. 

A temperature differential between driver and receiver 
causes loss of noise margin if I he receiver does mil track "fie 
temperature of the driver. The Series U4 limits the tempera- 
ture different iaJ across a printed circuit board and from 
board to board by using last-rnuving air from a cnmcnon 
plenum chamber to cuol the printed r:in nil hoards. For 
some signals, limiting temperature was nut quite enough: 
for these signals, differential drivers and receivers are used 



with a common reference voltage for each driver-receiver 
pair. 

ECL operates with a potential difference of 5.2 volts for 
10k ECL and 4.5 volts for 100k ECL. The most positive 
voltage is called V cc . Any voltage difference between V cc s 
in the system can subtract directly from the noise margins. 
The Series 64 makes \ : C (the common rail or signal ground. 
The CPU printed circuit boards, the frontplane, and the 
backplane use two thick copper planes close to the signal 
plane to distribute ground. Ground connections are distrib- 
uted along the board connectors to the frontplane and the 
backplane, reducing the effects of voltage differences be 
tween various parts of the system. Vgg, the most negative 
potential across the ECL lCs. is at -5.2 volts. Y EE is 
heavily bypassed to ground on each board by placing a 
capacitor next to every other IC on the board. Vgg is distrib- 
uted to the CPl 1 by two thick copper planes on thn 
backplane and one thick copper plane on each printed 
circuit board. Changes in V EE reduce noise margins by 
twenty-five percent of the change in V EE . V Tr (termination 
potential, the return for the termination resistors) is distrib- 
uted to the CPU like Vg^ It has much less effect on noise 
margins than V EE . 

The Series M GP1 uses transmission line techniques 
with closely controlled characteristic- impedance on all sig- 
nal nets to maintain signal integrity and speed. In TTL 
designs, it is possible for the signal to propagate up and 
down the signal path three times before the signal is stable. 
Signal paths with very lung Stubs (a branch off the main 
path | will cause reflections, thus making propagation delay 
longer than necessary. At higher bit rates and faster edge 
speeds these reflections may combine to cause loss of noise 
immunity. TTL wiring techniques use an input diode 
clamp built into the IC to reduce the amplitude of the 
undershoot or ri Jig in ;j, ECLte* bniques approach fe> prob- 
lem by matching the characteristic Lmpedam e, thus con- 
trolling reflections. The Series 64 uses micmstrip lines lor 
controlled characteristic impedance, A micmslrip line is a 
conductor strip or trace separated from a ground plane of 
conductive material by a dielectric. The characteristic im- 
pedance (ZJ can be controlled bv arriving a I a balance 
between the trace geometries, board materials, board toH to- 
ness, and power consumption. To reduce signal ringing 
and maintain signal integrity, the termination oi the line 
must match the Z„ of the printed circuit board. EG. is 



Stub Length 




Typical Layout 
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Typical Layout 
for ECL 



Fig. 3. Comparing ECL and TTL design techntques. 
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designed to drive lines of 50 ohms or greater. The Series 64 
uses 68 ohms as the Eiomiftal value of termination resistors 
except for special cases. Even in these cases, controlled 
impedance rules are followed, 

A thorough understanding of signal propagation is very 
important in ECL board design, Trace layout must be done 
in an orderly fashion. The designer must have complete) 
control of all traces so that stubs do not occur. A good 
reporting scheme for the actual board layout is necessary, so 
that timing can be accurately simulated for analysis. 

Control of Clock Skew 

A fast cycle time requires tight control of clock skew and 
system synchronisation. The Series 64 has a cycle time of 75 
ns. The clock skew was held to 5 us worst-case. This is nol 
very significant in itself except that no special manufactur- 
ing adjustments or considerations are necessary. This was 
assured by using equal traces from the point where clocks 
are generated to the point where clocks are received, by 
using low-propagation-delay 10k ECL and MECL III parts, 
and by distributing clocks to each board in the system by 
complementary pairs. The clock distributed to each board 
has a period of 37-5 ns. Each board receives a sync signal 
from the DCU (diagnostic control unit) that locks in-phasea 
divide-by-two circuit on each board, The absence of the 
sync signal stops clock generation on that board, The 75-ns 
clock derived on each board is then distributed to all ICs on 
the board that require a clock, Each clock has no more than 
four loads. A bonus of distributing the clock in this manner 
is that we are able to generate four phases of 18.75 ns for 
minor cycles in the system. These are used mostly in the 
memory and I/O systems. 

There are certain parts of the CPU that require close 
clock-to-clock tolerance, in these cases clock pairs are cho- 
sen such that the clock signals requ iring tight tolerance are 
driven from the same IC Distributing differentially driven 
clock signals removes skew caused by temperature and 
voltage changes between boards. 

Partitioning and Propagation Delay 

Efficient use of each clock means performing as many 
logical functions in each cycle as practical, and in the ideal 
case having the same propagation delay for all paths in the 
system. No path should have to wait for any other path. This 
is where propagation delay becomes an important factor in 
influencing partitioning of the circuitry, even more than 
logical functions, ECL works best w T hen signal paths flow 
from one point to one or more receivers, as shown in Pig, ;?. 
Stubs must be short and lines should be distributed such 
that the distance between loads is greater than 5 cm, This 
allows the line to approximate a distributed capacitive 
loading rather than a lumped capacitance which would 
cause reflections, The Series 64 CPU accomplishes this by 
bit-slicing the data path ot 1 h+- system into four bits for 
ALII A and four bits for ALUB on each of four printed circuit 
boards, called RALL1 boards. This enables each individual 
bit line to be completely contained on one printed circuit 
board. It is much easier to adhere to ECL rules if each line is 
contained on one board. The four RALUs contain the data 
paths for RANK1 and RANK2 (see Pig. 2), The data signals 
used for carry s and shifts are routed across the front plane to 



other RALUs, The frontplane is an added signal plane that 
ties six of the eleven CPU boards together, 

Higher Data Path Performance 

Higher performance and increased functionality are ob- 
tained by using high-speed 100k logic in the main data path 
of the CPU. The use of 100k logic decreases the propagation 
delay through the IC to one- ha If of what it would be for 1Uk 
ECL. Additional care had to be taken in board layout be- 
cause of the increased rise times and reduced voltage 
swings experienced with 100k ECL. The RANK1 and RANK2 
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data path is implemented in XOOk logic, except for the store. 
The store path, UBUS, uses 10k logic. Since 10k logic 
should not be driven by 100k logic because of their different 
threshold voltages, the CPU uses quad line drivers to inter- 
face between 100k and 10k logic. 

The increased speed and functionality of 100k ECL is 
used by the main data path of the ALU. the R and S operand 
registers, the R and S multiplexers and the carry lookahead 
circuit. The major advantages in functionality are in the 
multiplexers and the carry lookahead ICs> where a two-for- 
one reduction in parts count is achieved. The AL'_ 
perform eight logic operations and eight arithmetic opera- 
tions. In addition to performing binary arithmetic, the cir- 
cuit contains the necessary correction logic to perform BCD 



addition and subtraction. The ability to do decimal arithme- 
tic is not available in standard 10k ECL without extensive 
correction circuitry, which increases critical path delay. 
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Powerful Diagnostic Philosophy Reduces 
Downtime 

by David J. Ashkenas and Richard F. DeGabriele 



JL N IMPORTANT ATTRIBUTE of any computer sys- 
fl tern is availability. Hecause we rely on computers so 
» m heavily to increase our productivity, it is vital that 
they function properly all the time. From an availability 
standpoint, an ideal computer is one that never fails, Since 
this goal is difficult to attain, the next best computer is one 
t lift t rarely fails B2id call be quickly repaired when il does. 

A key factor to be considered in designing such a 
machine is thai it is generally difficult to isolate a faulty 
i omponent but it is usually easy to replace it. One method 
of solving this problem is to increase the si/.eot the smallest 
field-replaceable unit. With fewer subassemblies to con- 
sider, the process of determining which one to replace 
becomes easier. If this approach is carried to its logical 
cone I us i on , the entire computer can be made one unit. Fgi 
economic and other practical reasons, however, the small- 
est field-replaceable unit of the HP 3000 Series 64 is a 
printed circuit assembly or a power supply. An exception is 
the main memory RAM integrated circuits, which can be 
individually replaced. 

Because of the tremendous complexity of the Series 64 
(there are over 3000 ICs in the CPU alone), conventional 
methods of fault isolation, such as board swapping, are both 
time-consuming and expensive. Therefore, an innovative 
failure diagnosis philosophy was formulated and adhered 
to throughout the design and development of the computer. 
The resultant system is easy In use and provides for remote 
diagnosis and board-level isolation of mainframe fail m 

The field diagnostics u^d in previous Iff* :i0U0 Comput- 
ers do not diagnose rather, they test and then simply halt If 
a failure occurs, The customer engineer [CE) must then 
correlate the particular test that tailed with aspecifh i.ircuit 



function being tested, determine which board performs that 
function, and replace it. This requires that the CE be famil- 
iar enough with the diagnostic to know 7 precisely which 
circuit function it was testing. In addition, the CE must be 
familiar with the machine under test at the detailed block 
diagram level to know how that circuit function is par- 
titioned over the set of boards in the system, Without this 
detailed knowledge, troubleshooting is reduced to swap- 
ping boards on a best-guess basis until the failing board is 
isolated. It is expensive to train the CE to know the mat 
at the detailed block diagram level, However, board swap- 
ping is also costly: it takes time and large spare parts inven- 
tories must be carried 

To reduce repair time and inventory costs and to increase 
field productivity, the Series 64 diagnostics have been de- 
signed to indicate which board [s] in the system could con* 
tain the fault. To help achieve this objective, the diagnostics 
are written in microcode to exercise maximum isolation 
and control of the hardware being tested. 

The fault-locating diagnostics are packaged so that they 
obtain failure information while requiring no special train- 
ing or knowledge to use. In fact, the diagnostics are de- 
signed to be run either on-site by the customer or remotely 
from a field office so that the CE has an understanding of 
which replacement boards may be needed before traveling 
t< i 1 he computer site. This is possible because the diagnos- 
m themselves identify suspe* ted boards in order of fault 
probability [see Fig. l), Hence no special training orknnu I- 
edge of tbe hardware is necessary to interpret the test re- 
sults. 

All of the functions available at the system console are 
also available at a remote console via a modem and tele- 
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Fig. 1, Fault-locating diagnostics menu and a typical test 
result showing a successful diagnosis. 

phone line. In addition to normal console capabilities, a CE 
can load any diagnostic, monitor the ac and dc power sys- 
tems, and run the fault- location diagnostics without leav- 
ing the field office, Hence a customer's computer can be 
fully diagnosed before any trips to the site have been made. 

Diagnostic Control Unit 

The most powerful diagnostic tool of the Series Ei4 f the 
diagnostic control unit or DCl\ is completely built into the 
CPU. The DCU is a Z80 microprocessor-based board that 
provides the sole interface between the user and the main- 



frame (see Fig. 2). For the first time on an I IF 3OU0 T there are 
no separate front- panel controls and maintenance panels. 
All these functions can be executed from the system con- 
sole, which is connected directly to the DO f via a standard 
RS-232-Cserial data channel. The ROM -based program for 
the microprocessor configures the DCU and the console to 
allow maintenance, diagnostic, and system operator func- 
tions to be performed while remaining transparent to the 
user. 

Tlie DCU has access to all other printed circuit boards in 
the CPU/MEM cardcage through the use of serial shift 
strings. Each board is designed such that many of its state- 
determining elements (e.g. registers, flags, flip-flops, etc.) 
are implemented using shift register ICs. These elements 
are connected to form a large circular shift register known 
as a shift string. There is one string for each board. The DCU 
can freeze the machine, shift data out from a selected board 
for examination and/or modification, and then return the 
shift string to the board (see Fig. 3). 

The DCU also controls the clocks for the Series 64 system. 
It can clock the entire system or any subset of its boards from 
1 to 255 contiguous clocks. The shift string and clock con- 
trol capabilities of the DCU form the basis of a number of 
system control and diagnostic functions, including: load- 
ing and reading the WCS [writable control store] and mem- 
ory, system initialization, maintenance panel (i.e., register 
and flag] displays, and initial microprogram loading 
(power-up and cold-load sequences), 

When not performing any of the above tasks, the DCU 
operates the power system control {PSC) board, an interface 
between the DCU and the power system of the Series 04. It is 
via the PSC that the DCl > can monitor the ac line voltage, dc 
power supply voltages and currents, and other parameters. 
This information can be displayed on both the system con- 
sole and a light-emitting diode display on the PSC. Hence 
the CE can use the DCL 1 to identify potential problems in 1 he 
power system, 

Diagnostic Tests 

An equally important component of the Series 64 diag- 
nostic philosophy is the set of diagnostic tests used to verify 
the proper operation of all mainframe hardware and lo 
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detent and iscilate faults when they occur. The diagnostics 
are designed to test the computer in a hierarchical fashion, 
with three distinct levels of diagnosis (see Fig. 4). 

This hierarchy is designed to take maximum advantage of 
the built-in diagnostic features of the Series $4 to provide 
board-level isolation of failures. TIip procedure for tasting 
the computer involves applying the lowest-level tests first, 
and then working upward through the heirarchy, adding 
more known good hardware in small increments. Thus 
when a test fails, the faulty circuitry is implicitly isolated in 
most cases , since all hardware required to run lower- level 
tests has already been checked, 

Level 1— Kernel Hardware Verification 

The first level o I testing is designed to check the essential 
core, or kernel, oi hardware (hal is required to perform rill 
subsequent tests , Level l is composed of two separate 
diagnostics: tin- DGU self-test and the kernel hardware 
diagnostic. 

The DCL T self-test i a program Stored in ROM on the DCU. 
Initiated on power-up or via a command typed on the con- 
sole, it verifies most of the DCU and PSC hardware and 
provides a pass/fail indication on both the console rind a 
group of LEDs on (lie DCU board. 

The kernel hardware diagnostic verifies that portion ol 
the CPU hardware needed to load and run the next level of 
tests, the fault- locating ra i cry di agues ties, The kernel 
hard ware diagnostic i onsistsufaset of DC 1 1 commands that 
are stored on flexible disc and loaded into the DCU via the 
svslem console tor execution. In this diagnostic, fundamen- 
tal! GPU operations Jmicrosequencing. basic microinstruc- 
tion decoding, and main data paths) are checked out. Essen- 
tially this is a singles ycle fi" sf that is both applied and 
observed by (he DGU. The DCl" forces lines ul microcode 



Fig. 3. Each CPU or memory 
board & designed such that most 
of its f tip- f fops, flags, and registers 
form a large circular shift register 

or ''shift string ." This is a diagram 
of a shift string and the corre- 
sponding console display The 
string >s broken up into fields de- 
termined by the designer 



into the CPU, clocks the system, and then checks the results. 

fn this manner, the basic CPI I kernel is verified by .in inde- 
pendent processor. 

Level 2 — Fault-Locating Microdiagnostics 

Fault-locating microdiagnostics perform the bulk of the 
testing done on the Series 64 mainframe. They are used to 
isolate faults down to the board level. Primarily designed 
for field use, these diagnostics will. Upon a lest hi i lure, give 
the user a list of boards that COttld contain t ho fault. Usual!} 
this list will (.on tain four or fewer boards. 

Like t lie kernel hardware diagnostic these tests are stored 
on flexible disc and are loaded by the DCLL Unlike the 
kernel diagnostic, they are written in CPI I microcode. The 
Dl i s (oh Is to transfer the tests to WCS, initiate them, and 
interpret the results. The advantage ol testing in microcode 
is that hardware can be tested in small increments, Because 
the microdiagnostics are quite large, they are broken into 
five sections and the hardware is tested one subsystem at a 
time, 

To assure board-level isolation testing IS done on a Junc- 
tional circuit basis, A functional circuit is defined as several 
connected to implement a given logical function. 
With the knowledge of which functional I in ml is under 
tesl and how it is partitioned across the boards in the sys- 
tem, it becomes possible to specify which set of boards must 
contain the fault. This information Is embedded within 
each test. When a failure occurs, the DC I ' extra* ts this data 
from the microcode and displays the suspected printed 
circuit assemblies in order oi fault rank 

rheDCl is also used to enhance the isolation capability 
oi the fault-locating microdiagnostics. Some testa tise the 
to access hardware circuits that are otherwise inacces- 
sible to the microcode. This is accomplished by embedding 
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Fig . 4 . The dia gnostics tes i th e compute f in bier a rcfticaf fa shion . 
Successive tests involve increasing amounts of hardware. 

I)CL r requests in the microdiagnostic. Upon receiving a 
request, the DCU freezes the system, retrieves a command 
from a designated register, and restarts the microprogram 
after the command has been completed, 

Level 3 — Software Diagnostics 

The first two levels of diagnostic tests da a thorough 
check of all the CPU hardware. However, they do not rigor- 
ously test all the CPU instructions, nor do they exercise all 
of main memory, nor do I hey verify the proper operation of 
ail the I/O cards and peripherals. To test all this hardware, 
many software diagnostics have been developed or adapted 
from other HP 3000 Computers, These are written in 
higher-level languages rather than microcode, 

The CPU software diagnostics perform an extensive test 
of all the CPU machine instructions, expecially those 
unique to the Series 64, These tests do not isolate faults to 
the board level, but rather give a pass/fail indication, 

The main memory diagnostic tests the main memory 
arrays as w r ell as error correction and logging circuitry. Any 
array failures are isolated to the faully ICs, which can be 
replaced in the field. 

The I/O system diagnostics are test programs that have 
been adapted from those written for the HP 3000 Series 33, 
40. and 44. They check the entire I/O system of the Series 64 T 
including all boards that can be plugged into [he I/O card 
cage. There are also diagnostics for a number of tapes, discs, 
and printers that can be connected to the mainframe, 

All software diagnostics are stored on magnetic tape and 
are loaded and run by using the diagnostic utility system 
(DUS). a simplified operating system, The role of the DCU 
in these tests is very small. These diagnostics are intended 
tor use by the CE rather than the customer and require some 
specialized knowledge to configure and execute, 
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One Hour to Repair 

This combination of diagnostic hardware and microcode 
allows unprecedented suppor lability in the field. An indi- 
cation of the success of this effort has been the achievement 
of one ot I he primary design goals of the Series 64: an MTTR 
(mean time to repair] of less than one hour. Hence the 
Series 64 may be regarded as a computer with very high 
availability. 
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A High-Performance Memory System with 
Growth Capability 

by Ken M* Hodor and Malcolm E, Woodward 



IX THE BEGINNING, computers consisted of a simple 
central processing unit with a few registers, some mem- 
ory, and an input/output system (Fig, 1). As technology 
advanced, the central processing unit grew and acquired 
more registers and the RAM (random- ace ess memory) size 
grew as new higher-density RAMs became available. 

The overriding goal of the central processor became high 
speed, and the overriding goal of RAM design became low 
cost. As RAM became a larger portion r>f the cost of a 
computer system, cheaper RAMs were developed, More 
registers were used to fill the need for fast access to data. 

As the conflict between memory and CPU grew, there 
evolved the cache memory. This memory contains a small 
subset of the information in the main memory array, but its 
speed of access is much closer to what the central processor 
requires. However, the cache integrated circuits are rela- 
tively expensive compared to the integrated circuits used in 
the main memory array. 

While this was going on, computer in put' output systems 
were developing B need for larger and larger butter storage 
areas, I/O buffers attempt lo match the speed of the I/O 
devices with the speed of the main memory array. As the 
speed of the CPU has increased, the demand for more I'O 
memory has also increased. 

This is where the HP 3000 Series 64 is today. Fig, 2 is a 
basic block diagram of the system. The basic modules are 
the central processor and cache memory, the input/output 
adapters (IOA) T and the memory module (MEM), These 
modules communicate with one another over I he central 
system bus (CSB] 

The central system bus is a synchronous, genera 1- 
14-MIU bus r Data transfers to and from memory 
are on a 16-byte block basis. Addresses, data, and messages 
share the same 32-bit path with two parity bits. Transfer 
rates as high as 56 megabytes per second can be achi' 
although tin' current memory modules limit this to 18 
megabytes per 

Basic Information Transfer 

The basic information transfer on the CSB is either to read 
data from memory or to write information to memory. 1 



Memory 



CPU 


Registers 



Registers 



a write to memory, a module sends the address followed by 
the block of data to be written to memory. This takes a total 
of five clock cycles to go across the CSB. one clock cycle for 
the address and four for the four 3Z-btt data words. 

For a read operation, data can be In the memory module 
or in any of the caches in the system, For a read cycle a 
module sends an address out to memory- via the CSB and 
every module looks to see if it has the most current data at 
that address. Only the copy held by the last module to 
access the data will be flagged as valid. 

If memory has the only copy of the information, the mem- 
ory module provides it to the requesting module. If one of 
the caches on the CSB has a more up-to-date copy of the data 
requested, this module must give it up and supply it to the 
requesting module. A module Is said to abort the memory 
request and become memory, supplying the data, This fea- 
ture increases the effective memory bandwidth, thus in- 
creasing the system performance. 

The CSB is also used for sending messages. These mes- 
sages may be originated by the GPU or by one of the system 
; i U 1 1 ules, Messages are used by the CP U to request the stat us 
oJ the wi rimis modules and to control their operating 
Messages are used by the system modules to report status 
changes and to reply to CPU requests. One example of a 
message is the CPU's requesting the memory module to 
report memory size. Another example of a message transac- 
tion is the I/O adapter's reporting that a device is requesting 
ssrvii b 

Messages are also used extensively by the microdiagnos- 
tics. The diagnostics use messages to set up conditions for 
various test cases. They also use them in the same manner 
that the system would during normal operations. 

Memory Module 

Fig, it is a block diagram of the memory module, To read 
from memory an address is sent over the CSB and through 
the common bus interface fCBI), through the buffers on the 
memory module and out to the main memory arrays 
(MMAs). Each MM A responds to an address range. The 



CPU Register* 




Fig. 1 . Simple computer block diagram. 



I'O Expansion 

Fig, 2. HP 30D0 Series 64 hiock diagram showing cache and 
I tQ buffer memories used to match modules that operate at 
different speeds 
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Fig, 3. Memory module biock diagram. 

addresses are generated by adders on ibe arrays. Each MM A 
added In the system adds the array size corning horn the 
board next to it to the amount of memory available on its 
board. This allows MMAs with various memory sizes to be 
mixed within a system. The memory address goes out to all 
of the MMAs and the MM A that has the correct address 
range will respond with the data. The data will come out of 
the array in four 39-bit words, each consisting of 32 bits of 
data and seven bits for error detection and correction, 

As the information goes through the error correction and 
detection Logic, .single- bit errors are corrected and logged in 
the error-logging RAMs. Double-bit errors are not corrected, 
but axe sent back to the requesting module as they came 
from the MM A. An error condition is sen I to the requesting 
module with the bad word, Tn get to the requesting module 
they must go through the CB1, 

For a write lo memory the information comes through the 
CBI and into the buffer. The RAMs cannot handle the in- 
formation as fast as the CSB can deliver it, so buffering is 
used to handle the mismatch in speed. For a write the 
address is sent to the MMAs followed by the four 39-bit 
words, 32 bits of data and seven syndrome bits for error 
correction and detection. The seven syndrome bits are gen- 
erated in the error correction and detection circuitry and 
sent to the MMAs. 

The RAMs are dynamic and must be refreshed periodi- 
cally, so there is refresh logic within the memory module. A 
refresh takes priority over an access of memory. If a refresh 
is needed it can hold off the reading or writing of data in the 
MMAs. Thus the amount of time to access memory is vari- 
able, Refresh has a minimal effect on the effective memory 
bandwidth, 

The Series 64 has automatic power- fail auto-restart capa- 
bility. If the power to the machine should fail, all of the 



information in the caches will be sent to memory within 5 
milliseconds, Memory has battery backup power, which is 
supplied to the refresh logic, RAMs, and other necessary 
circuitry to keep the information alive in memory. When 
the power returns, the system automatically resumes where 
it left off. 

Messages sent to memory are used for determining mem- 
ory size, to respond to various error conditions, for diagnos- 
tic purposes to check out various paths, and to find out if 
any read of memory has caused a single-bit error and which 
RAM caused the error condition. 

Cache Module 

The cache module consists of the cache memory array 
(CM A I and the cache array controller [CAC] printed circuit 
boards [see Fig. 4). The cache serves two functions, First, it 
is a high-speed buffer between the CPU and the main mem- 
ory module. Second the cache is a communications path 
between the CPU and all of the other system modules. 

There are five data paths to and from the cache and one 
between the CAC and GMA. These are: 

■ The processor data bus (PDB| 

■ The cache address bus |CAB| 

■ The address and data bus (ADATA) 
m The data bus (DATA| 

■ The intercache bus (ICB) 

■ The cache data bus (CDB). 

The processor data bus provides data to the cache mem- 
ory array and commands to the cache array controller. The 
cache address bus provides the cache with address informa- 
tion from the CPU and messages from the CPU to other 
system modules. The ADATA bus provides the cache with 
addresses and data from the central system bus. The cache 
data bus provides the CPU with data from the cache and the 
central system bus. The DATA bus provides addresses and 
data to the central system bus from the cache. The inter- 
cache bus provides an address palh between the CAC and 
the CMA. 

The cache memory array consists of ttK bytes of high- 
speed ECL RAM which serves as a high- speed buffer be- 
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Fig, 4. Cache module block diagram 
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tween main memory and the CPU. The array is organized 
into two data sets, Each data set contains 4K bytes of data 
arranged into 32-bit words. The CM A also provides the data 
path for incoming messages from the system modules to the 

The cache array controller contains the control logic for 
the cache. The CAC keeps track of what dala is contained in 
the cache memory array. When a memory request is issued 
the CPU or any of the system rnoduj AC deter* 

mines whether the CMA contains the requested informa- 
tion. If it does* the CAC takes the necessary action to supply 
the data to the requester. If the CMA doesn't contain the 
requested data, the actions taken by the CAC depend upon 
the requester. If the requester is another system module, no 
action is taken by the cache array controller. If the requester 
is the GPU, the CAC tells the CPU thai the data is not 
available and initiates a memory request for the OtteiBory 
block ( 16 bytes) that contains the requested data. When the 
cache memory array receives the requested block, the CAC 
instructs the CMA to supply the requested data to the CPU 
via I he cache data bus and informs the CPU that the re- 
quested d&ta is available. 



response to a CACHE ERROR interrupt is a system halt, 

I/O Adapter 

The 1 Q adapter, which interfaces the Series 64 to an LO 
bay, has a small cache used as a buffer. The LO cache is 
structured as a four-set associative memory. Each set is one 
block deep. The article on page 18 goes into more detail. 

The LO adapter also uses the central system bus for the 
communications with the other modules. This module also 
accesses the CSB via a common bus interface. 
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Bus Control Commands 

Communications between the CPU and other system 
modules, including the cache* take the form of messages. 
The CPU initiates these messages by sending bus control 
[BUSC] commands to the cache array controller, RISC 
commands are used in normal system operations and in 
diagnostic operations. When the CPU issun com- 
mand, the CAC determines whether the 

sent to another system module cir In be at led upon by the 
cache. The messages the cache acts upon are: 

■ Read cache status 

■ Write cache status 

■ Write tag data 

■ Verify tag data 

■ Read incoming message. 

The send word message, SNDWRD, is sent to the ad- 
dressed system module via the central system bus. The 
SNDWRD message is addressed to main memory, one of the 
I/O modules, or the cache. The messages that are sent to the 
other system modules are used to: 
• Determine system configuration 

■ Report stains changes of modules to the CPI 

■ Read error conditions and the memory error log 

■ Perform diagnostic exercising. 

The cache may interrupt the CPI " in two ways, The first is 
a message interrupt, This interrupt occurs when the cache 
array controller detects an incoming message from << 
the system modules, The CAC responds to an incoming 
message by issuing the MSGINT signal to theCPU and hold- 
ion the message word until the CPU has had time to read it, 
MSGINT causes the CPU to branch to the proper place in 
microcode and read the incoming message. The CPU re- 
sponds to the message interrupt by issuing BUSC com- 
mands in read the incoming message, 

The second interrupt is an error condition (CACHE ER- 
ROR] whii li is neneruted 1)\ ihe CAC when a cat. atrophic 
error has been detected in the cache memory array data or in 
the tag and status Information relating to the CMA data. The 
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An Input/Output System for a 1-MIPS 
Computer 



by W. Gordon Matheson and J. Marcus Stewart 



111 HEN THE HIGH-SPEED CPU and central system 
\[\g bus for the HP 3000 Series fi4 were first con- 
ceived, considerable attention was given to the 
goals and design approach tor the I/O system hardware 
What emerged was a system that is compatible with earlier 
systems, yet provides for expansion and future growth. 

On the HP 3000 Series 30, 33, 40 r and 44 Computer 
Systems, all I/O channels, the CPU, and main memory 
i ■ inimunicate on the intermodule bus (1MB). The 1MB la ail 
asynchronous TTL-technology backplane bus that will 
support up to IS I/O ports, depending on total length and 
configuration (see system diagram, Fig. l t and 1MB sum- 
mary in Table 1). Previously there have been two types of 
I/O ports available for the 1MB: the 31262A General I/O 
Channel, which connects HP-IB (IEEE 488) devices to the 
system, and the 3 164 2 A Asynchronous Data Communica- 
tion Channel for RS-232-C terminals and modems. Many 
kinds of peripherals attach to these two channels, including 
disc memories, magnetic tapes, printers, printer interfaces, 
laser printers, and CRT terminals. 

The input/output hardware of the HP 300 Computer is 
similar to that of the HP 3000 family members mentioned 
above, 1 

Series 64 I/O System Goals 

Since the HP 3000 Series 64 is a member of a family, and 
since the goal was to support many of the same peripherals 
that are available for the other family members, the 
strategy for the I/O system was to retain the 1MB as the 
primary attachment of I/O to the system if it could be made 
compatible with other system requirements. This allows 
an easier upgrade path for users, who can use many of the 
same cables, I/O channels, and some IMB-compatible device 
controllers that axe used in earlier systems. Of course, it 
also allowed HP to implement a system that had been proven 

(MB-Based Computer System (HP 3000 Series 40'44) 



T intermodule Bus 
{1MB} 
— i 1— 



Device 
Controller 



Channel 

(GIC) 



I/O 
Channel 

(GIC) 




I/O Cable 



in other HP computers, and to minimize software driver 
development, hardware desigm and control program algo- 
rithm development. Other goals of the HP 3000 Series 64 
I/O system were improved I/O throughput and support of 
larger numbers of peripherals than on the next fastest 
family member, the Series 44, 

To help achieve these goals and assure that the higher 
CPU performance is augmented by higher I/O performance, 
three major improvements have been made: 

1. The new 301 44A Advanced Terminal Processor (ATP] is 
used for datacomm interfacing (see article, page 22). 
This eliminates much of the CPU intervention required 
by previous products for datacomm transfers. It allows 
attachment of up to 96 datacomm connections per chan- 
nel on the intermodule bus. The ATP has an extensive 
DMA (direct memory access] facility, whereas the older 
31264A Asynchronous Data Communication Channel 
required channel program intervention by the processor 
for every byte transferred. 

2. A round-robin approach was adopted for servicing the 
channel programs for devices on the general I/O chan- 
nels. This is a rotating priority scheme that gives equal 
service to all devices. With large system configurations 
having large numbers of high-use discs, round-robin 
servicing eliminates the lockout of lower-priority de- 
vices by higher-priority devices, 

3. Provisions for multiple I/O buses. The throughput of the 
centra) system bus is maximized if several intermodule 
buses are able to attach to the CSB and perform DMA 
simultaneously. The throughput of the 1MB is much 
lower than that of the CSB. and the CPU averages a small 
portion of the maximum bandwidth of the CSB, so mul- 
tiple IMBs can provide enhanced DMA throughput and 
allow for attachment of larger numbers of I/O channels 
than could be supported by a single 1MB. 
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Fig. t. HP 3000 family input/ 
output system diagram. Ail ttO 
channels, the CPU. and mam 
memory communicate on the in- 
termodule bus ((MB.: 



1 6 HE'A'LET I . . J^NAL MARCH 19S2 



)Copr. 1949-1998 Hewlett-Packard Co. 



While the LO subsystem for the HP 3000 Series 64 is 
designed for optimized attachment to the 1MB- it was de- 
veloped under a consistent set of system rules for LO opera- 
tions and hardware interface. Non-IMB types of I O 
hardware may be attached to the system if the need arises, 
and the HP 3000 Series 64 will be able to keep pace with 
future I O system developments. 

A final and extremely important goal was to make I O 
hardware problems easy to diagnose accurately and fast fo 
repair. This means that it must be possible to check each 
level of hardware interface quite thoroughly by diagnostics 
before extending testing to levels farther out, For example T 
an untested logical block on an I O channel could cause a 
channel failure to be diagnosed as a device failure, To 
achieve the goal of diagnosability, a significant part of each 
hardware component consists of diagnostic features 
that allow on-board loopback of data and control bits, 
checking of error detection logic, and simulating opera- 
tions that normally involve the next level of hardware. 

I/O Adapters 

The CSB was developed for very high-speed data trans- 
fers and synchronized operation. Its implementation was 
necessary to achieve the memory and CPU performance 
goals for the system. Unfortunately, the CSB can't be used 
for direct attachment of input output channels. The main 
difficulty is that it is necessary to restrict the physical 
length of l he bus to maintain the capability of high transfer 
rates; otherwise, it would take too long for a signal to propa- 
gate from one end of the bus to the other. Physically, the 
CSB is less than 15 cm long. This places severe restrictions 
on the number of printed circuit assemblies that can be 
plugged into the bus, ftoi this ;md other reasons, a separate 
I/O bus is required for the HP 3000 Series 64, and the 1MB 
was chosen to do the job. All I/O channels [general I'O 
channels, advanced terminal processors, intelligent net- 
work processors, etc) plug into an IMH-iype backplane in 
the t'O card cage, and an I/O adapter has been developed to 
provide for communication between the two buses. 

An I/O adapter, in general, is any interface between the 
CSB, which supports the CPU ant] niciiri memory, and 
another bus that supports I/O channels and devices. On the 
HP 3000 Series 64, the present I/O bus is the 1MB, but some 
other bus could be used so long as the proper I/O adapter is 
provided. 

As an integral part of the Series 64. the I/O adapter is 
required to perform the following functions: 

■ Electrical translations* The CSB is an ECL- level bus, If the 
I/O bus u.srs other logic levels [the 1MB. for example, is a 
TTL 1 1 mm. it is up to the I/O adapter to provide translation 
between the two 

■ Command and service request translations. On the CSB, 
all commands and requests are sent in the form of mes- 
sages. An I/O bus may have a similar scheme, or it may 
use separate lines for service requests and for issuing 
commands, as the 1MB does. In either case, the I/O adapt- 
er must translate requests for service on the 1,0 bus, 
whatever their form, to messages addressed to I lie ( 1M F, 
and messages from the CPl f must be properly translated 
to commands on the I/O bus that use the proper hand- 
shaking proNn i H 



m Memory buffer. All memory transactions on the CSB take 
place in 16-byte chunks in the form of four contiguous 
32-bit words transferred in four successive clock cycles. 
If the 1 bus uses any other size as its minimum address- 
able data block (on the 1MB. it is one 16-bit word), the I/O 
adapter must provide some sort of buffer to convert one 
size to the other. 
■ Synchronization to system clock. The CSB is a fully syn- 
chronous bus. The 10 bus may be either asynchronous as 
the 1MB is, or synchronized to another clock. In either 
case, it is up to the 10 adapter to synchronize the L O bus 
to the CSB. 

The Series 64 system currently supports up to two I/O 
adapters, making it possible to increase significantly the I/O 
bandwidth of the system, since two separate I'O buses can 
transfer data independently at the same time. 

Interfacing the 1MB to the Series 64 

The 1MB I/O adapter for the Series 64 consists of three 
printed circuit assemblies and the associated interconnect- 
ing cables. The three printed circuit assemblies are the 
common bus interface (CBIj, the 1.0 buffer [IOB], and the 
1MB interface (IMBI] assemblies. They are connected as 
shown in Fig. 2. The IOB and CBI both reside in the main 
[CPU] cardeage and the IMBI resides in the 1MB cardcage. 
The common bus interface is the universal assembly that 
interfaces modules on the central system bus to the CSB 
itself Communication signals between the I/O buffer and 
the common bus interface travel along the backplane and 
through a very short 2-cm frontplane cable. The I/O buffer is 
connected to the intermodule bus interface with two 1,5-m 
transmission line cables. Each of these cables carries 30 



Table I 

Intermodule Bus (1MB) and Central System Bus (CSB) 
Characteristics 



Characteristic 


1MB 


CSB 


Technology 


TTI h.j . 


ECL logic 


Address bus 


!>its 


32 bits, 
multiplexed 


Data bits 


16 bits 


Handshake 


asynchronous, 

separate 
address, data 


synchronous, 
multiplexed 
address, data 


Memory data 


16-bit word 


blocks of four 
32-bit words 


Memurv 
operations 


read, write word 


variety of 
read/ write block 


I/O opfTHiirm.s 


I/O read, write 
(broadcast/select ivtf) 


two types <ij 
32-bit messages 


DMA access 


slot number priority 


rotating priority 


Channel program 
request 


,is\ iK hhnious signal 


unsolicited 
messages 


Interrupt Request 


asynchronous signal 
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ECL 


TTL 
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TTL 
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Signals 
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Intermodule Bus 



Fig. 2. The HO adapter interfaces the Series 64 to the 1MB It 
consists of three printed circuit assemblies. 

signal and 60 associated ground conductors, providing bal- 
anced 92- ohm paths for the TTL- level interface signals be- 
tween the JOB and IMBI. 

In the 1MB I/O adapter, signal level translation between 
the central system bus and the intermodule bus is done by 
the L'O buffer. Referring again to Fig. 2 t the broken line 
across the IOB indicates the boundary between ECL- level 
signals and TTL-level signals. Except for the level trans- 
lators, the lOB's internal logic is implemented with ECL 
devices. 

Another general requirement of an I/O adapter is the 
translation of the sixteen- byte memory transfers on the CSB 
to whatever size the I/O bus uses. The 1MB works with one 
sixteen- bit word at d time, and the IOB takes care of the 
buffering between the two. The IOB uses a four-set, fully 
associative cache with a two-cycle (1 50 nanosecond) access 
time as the memory buffer. "Fully associative 1 ' means that 
the IOB can store copies of up to four sixleen-hyte blocks 
from main memory at the same time, and any of the four 
blocks can be associated with any arbitrary memory ad- 
dress, In terms of performance, this means the IOB can 
support up to four simultaneous high-speed direct memory 
accesses (DMAs) at the same time without any thrashing 
(excessive swapping) in the cache. Besides acting as mem- 
ory to the intermodule bus. the JOB supports central system 
bus memory activity by checking addresses on the CSB and 
providing any blocks requested for which it has a valid 
copy. 

Still another requirement of the I/O adapter is translation 
of commands between the CSB and the I/O bus (see Fig. 3). 
The 1MB LO adapter's specific task is to translate channel 
program service and interrupt requests on the 1MB to mes- 
sages to the CPU, and to translate messages from the CPU 
into global or addressed commands on the 1MB. Most of this 
is done by the IMBL with the IOB acting as a level translator 
and traffic controller. When a request line on the 1MB is 
asserted, the 1MB I/O adapter formulates a 3 2-bit unsolicited 
message and transmits it to the CPU, then disables further 
messages until they are reenabied by commands from the 
CPU. 1MB and l.O adapter commands are sent as a single 
32-bit message from the CPU. When the I/O adapter com- 
pletes the handshaking of the command it formulates a 
32-bit response message containing 16 bits of status (parity 



error on command, 1MB handshake timeout, etc.] and the 
contents of the 1MB data bus when the command execution 
is completed. 

One of the biggest challenges for the 1MB I/O adapter is 
that of synchronization. Since the 1MB is a totally asyn- 
chronous bus and the CSB is totally synchronous, all trans- 
actions on the 1MB have to be synchronized with the Series 
64 's master clock before any interaction with the CSB can 
take place. All of this is done by the IMBL which is itself 
synchronized with the system clock. What makes this task 
challenging is the high speed of the Series 64 's ma.sh-r 
clock; the cycle time is only 75 ns. The norma] method for 
synchronizing any signal is to run it through a series of 
U-type flip-Hops, with each successive stage minimizing 
the chance of an unstable output at the end of the chain. 
Normally, the higher the clock speed, the more levels of 
synchronization are needed to assure stability, However, 
each level of synchronization adds delay to any signal that 
must be synchronized and reduces the bandwidth of the 
bus. The IMBL therefore, uses a specially designed state 
transition system that imposes minimum extra delay on the 
signal being synchronized by being tolerant of metastable 
states. Thus the IMBI is designed with only one level of 
synchronization on each signal, with the state machine 
adding the extra levels needed for high reliability. 

Expansion and Growth 

To support more than one 1MB for the performance and 
configurability issues discussed earlier, a. few modifica- 
tions were necessary both to software and to the 1MB. 

The I/O bay is wide enough to support a 24- slot 1MB 
backplane, which is divided into two IMBs of eight slots 
and sixteen slots respectively (see Fig, 4). The IMBI resides 
in the slot nearest the CPU bay of each 1MB. Channel and 
device controller boards may be inserted in the remaining 
slots with a few configuration guidelines, All cables leaving 
the system do so through a junction panel, which provides a 
solid chassis ground for them. The larger 1MB accommodates 
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Fig. 3* The HO adapter translates various commands be- 
tween the central system bus (CSB} and the HO channel. 
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a reasonable system 10 configuration before space or per- 
formance requirements indicate the need for splitting the 
bandwidth or adding more channels than can fit onto a 
single 1MB. The closer an I/O channel is to the IMBl. th« 
higher its priority for memory access. 

The Series 64 has a device reference table structure simi- 
lar io the other members of the family. That is, for each of the 
eight devices per channel, a tour- word entry in main mem- 
ory provides the following information; 

1. The address of the channel program for that device. 

2. The address of the channel program variable area for the 
device fto save parameters relating to execution of the 
channel programs]. 

3. The label of the interrupt handler code segment for ex- 
ternal interrupts from the device. 

4. A word for run-time execution state information for the 
device (e,g. f the reason a channel program is suspended]. 

The total device reference table length has been ex- 
panded from 120 to 512 entries to accomodate up to four I/O 
adapters for future growth, and its starling location is not 
fixed in hardware, but has been made indirect through a 
pointer in reserved main memory. 

The RMSK and SMSK instructions [read mask and m>I 
mask] specify which I/O channels are allowed to generate 
interrupts to the CPU. These instructions also had to 
change, since software is providing for lour 10 adapters, 
each of which can have 15 channels (channel U is not used 
mi the 1MB). The interrupt masks are now contained in 
words 32-35 of memory. 

Similarly, all of the I/O instructions have been expanded 
to include specification of the I/O adapter number. 

ll is a testimonial to the modularity of the I/O system 
software that these changes were implemented with mini- 
on I lime and effort, since only a few common system sub- 
routines required modification. The MPE-IV operating sys- 
tem will work equally well with single-IMB systems or with 
the Series 64. since it can distinguish the hardware system 
under which it is running, 

An intelligent processor on an 1MB might need Io know 
where its device reference table is, if it is executing its own 
channel programs. Therefore, two extra lines were added to 
the 1MB to let the smart channels recognize the I Mil lo 
which they are attached. 

Summary 

The HP 3000 Series 64 I/O system uses much of the same 



Fig. 4, The Senes 84 I/O bay 
holds two intermodule buses 
(IMBs) of eight and Sixteen stofs 



hardware and supports the same peripherals as the Series 
30, 33, 41). and 44 while providing an expansion path to 
meet the needs of users of larger and faster systems. This 
was also accomplished with relatively minor changes in the 
I Q software. Yet there is also provision for adaptability to 
future requirements in the I/O area. 
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The Advanced Terminal Processor: A New 
Terminal I/O Controller for the HP 3000 



by James E. Beetem 



THE ADVANCED TERMINAL PROCESSOR (ATP) 
is a new terminal I/O controller designed specifi- 
cally for the IIP 3000 Series 64. Although the basic 
ATP design is compatible with any HP 3000 I hat uses the 
intermodule bus (1MB), its design center is a huge system 
with the processing power of the Series 64. 

The HP 3000 Terminal I/O Environment 

The first step in designing a new terminal I/O controller is 
to establish the maximum number of terminals to be sup- 
ported and the demands that each terminal will place on the 
system. The terminal I/O loads expected an the Series 64 
can be predicted by projecting the same loads on the widely 
used HP 3000 Series III to a system with four to six times its 
processing power. The resulting numbers can be adjusted 
further to allow for the trends that are apparent in the IIP 
3000 applications environment. 

A typical HP 3000 Series III has IS terminals connected, 
three of them via modems. Usually, during the busiest part 
of the day, 25 of the terminals have active sessions and 20 
terminals h re actually in use, About two- thirds of the termi- 
nals are supplied by HP and the rest come from a wide 
variety of vendors. 

The maximum supported number of terminal I/O ports for 
the Series III is 64, All 64 may be opera led as modem ports, 
although the typical system has only three modem ports. 

The number of terminals that an HP 3000 can operate 
with acceptable user response times is strongly dependent 
upon the demands the application package places on the 
system. Customers have reported satisfactory response 
times with their application package supporting as few as 
ten and as many as 48 terminals. The typical numbers 
chosen for this analysis represent a 75th percentile case — 
only about 25% of IIP 3000 customers have more terminals 
connected to their systems. 

On the typical HP 3000 Series I1I T the average I/O load per 
terminal during peak hours is about 20 characters peT sec- 
ond. The load is generated by processing one user transac- 
tion every 20 seconds at ©ach terminal. A transaction in- 
volves writing a message to the screen of the terminal and 
reading in the operator's reply. Usually, the output is about 
300 characters and the input is about 100 characters. The 
transaction time and the number of I. O characters vary over 
a very wide range and are strongly dependent upon the 
specific application being run. 

The actual I Q load in any short period of time varies 
widely from the average. Peaks of up to 1000 characters per 
second occur occasionally when several terminal devices 
operate concurrently. 

Several trends are evident that will have an impact on 
terminal I/O, First, as terminal prices decline, more and 
more terminals that are lightly used are attached to the 



system. These terminals demand much less of the system's 
resources. On the other hand, the new intelligent terminals 
have substantial processing power and frequently move 
masses of data to and from the system, These devices can 
generate very heavy I/O loads. 

Another class of devices placing significant terminal I/O 
loads on the system consists of character printers such as 
the HP 263 IB and HP 2601 A. When in operation, these 
devices may require as many as 180 characters per second. 
Because these devices tend to operate during the system's 
peak loads, they can sharply increase the average I/O loads, 

Most of the newer terminal devices allow higher line 
speeds. Almost all terminals now offer a maximum line 
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speed of 9600 bits per second (bps). New designs call for a 
speed of 19,200 bps. These higher data rates will cause 
fewer but higher peak 10 loads. 

Based on this analysis, the ATP for the Series 64 was 
cued to allow a maximum configuration of 256 termi- 
nals. It expects to have about 200 terminals connected in a 
typical installation. About 125 devices will have active 
sessions with 100 actually in use at any given time. The 
average I O load is expected to be about 4000 characters 
per second with occasional peaks of 20.000 characters per 
second. However, it is important that the reader understand 
that the numbers developed in this section are design center 
and maxima for a subsystem within the Series fi4. Con- 
straints external to the terminal LO subsystem may not 
allow it to achieve its design maxima. 

Other Design Considerations 

The trend to higher line speeds causes another problem. 
The RS-232-C standard used to connect terminals to the HP 
3000 has a limited speed/distance relationship, In theory, 
devices must be within 16 metres of the system. In actual 
installations, these limitations are often exceeded without 
problems at slower line speeds. At 9600 and 19,200 bps. the 
speed/distance relationship is a real limitation, restricting 
successful high-speed operation to terminals that are rela- 
tively close to the system. 

A new modem interface standard Is emerging: RlS-449. 
When these new devices become widely used, the HP 3000 
Series 64 will need to support them. Provisions are made in 
the new controller to allow customers to operate both KS- 
232-C and RS-449 types of modems on a single Series 64, 
Flexible junction panels support both modem standards, 
the existing direct connection standard, and a new direct 
connection standard with an improved speed/distance rela- 
tionship. 

Another consideration is how terminals are operated* 
Current HP 3000 software operates terminals in one of two 
modes. Each mode makes sharply differing demands upon 
the terminal I'D system. In character mode, a human 
operator is typing characters and using the editing facilities 
provided by the host computer system to prepare input data 
strings. Characters arrive at the computer infrequently, but 
each character must be compared to a large special charac- 
ter set and the appropriate actions must be taktm. 

In block mode, the terminal provides the editing 
facilities. When the input data string is ready, the terminal 
transmits it as one continuous bku:k of data. The data ar- 
rives as fast as the terminal can transmit it. usually close to 
the line speed. At 9600 bps. one character arrives each 
millisecond. Very little processing is required, only a check 
for the character that marks the end of the block. 

Under the current MPK [Multiprogramming Executive) 
terminal I/O system, the terminal I/O controller sometimes 
does not know which mode is in effect, Therefore, the ATP 
is designed to handle the character mode processing load at 
the block mode data rate. 

ATP Design 

Hie k^y element in the ATP design is the 6801 single- 
chip microcomputer. This device is programmed In provide 
the functions needed to support an HP 3000 terminal port, It 



is called the port controller chip (PCC)< One of these chips is 
used for each terminal port. 

The ATP hardware is designed around the PCC, provid- 
ing the interfaces to the intermodule bus and to the terminal 
or modem (Fig, 1). The ATP software is designed for flex- 
ibility, to allow easy implementation and support oi a 
wide range of facilities. 

Port Controller Chip (PCC) 

The 6S01 PCC contains an enhanced 6800 microproces- 
sor. ZK bytes of read-only memory (ROM), 128 bytes of 
random-access memory (RAM); a universal asynchronous 
receiver transmitter (UART) T and a timer with an edge- 
triggered counter. The UART provides the bit- serial inter- 
face to the terminal, eliminating the need for a separate 
UART chip. The microprocessor and its memory provide 
the processing power to do character handling without 
burdening the system processor. The PCC can handle the 
full line speed of 19,200 bits per second (1920 characters 
per second] in all modes of operation. 

Since the PCCs microprocessor is dedicated to one port, 
the ROM-based software that operates the PCC can be very 
simple. There is no need for software to share resources or to 
resolve contention problems. The PCC is a simple slave 
processor, dedicated to doing a single task very quickly and 
efficiently. 

The PCC checks the input and output data streams for 
special characters, System software may define two input 
edit sets and enable either of them for comparison with the 
input data stream. Two output sets are defined, one to scan 
the output stream and one to scan the input data stream 
during an output. These facilities give software complete 
control of the full-duplex operation of the PCC without 
impact on the system processor. 

The PCC handles both the ENQ/ACK and X-ON/X-Ol K (Jmv 
control handshakes that prevent data overruns at the termi- 
nal. Also, it generates the time delays required by unbuff- 
ered teletypewriter devices that must perform physical 
movement of the carriage or print mechanism. Both theENQ 
and ACK characters may be redefined by system software. 

The PCC generates output parity and checks input parity 
when parity is enabled, In the 7-bit data mode, it clears the 
eighth input bit to zero and sets It to zero or one on output. 

The PCC is provided with three paths to or from system 
memory* One path is used to send control information (or- 
ders) to the PCC from system software. This facility replaces 
the channel programs used by other intermodule-bus-based 
I/O devices and eliminates the need for channel program 
processing facilities. 

The remaining two paths provide an input path to and an 
output path from system memory for data. Having two 
unidirectional paths rather than one bidirectional path [al- 
lows the PCC to perform both input and output of data as the 
result of a single control order. Also t it allows the switch 
from output to input to occur very quickly, minimizing the 
ATP's response time. 

The PCC does not have access to the address registers for 
these paths, Tins Implementation simplifies tfafl hardware 
design, but requires the PCC to interrupt the sy stein pa 
sor to handle the backspace and delete-line special charac- 
ters. These line-edit functions occur rarely and usually at 
human typing speeds, so the workload they generate for the 
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system processor is very low. 

The PCCs timer is used for most terminal timeouts while 
the edge-triggered counter allows the PCC to do speed sens- 
ing accurately at all supported line speeds by measuring the 
width of the asynchronous protocol's character start bit 

A portion of the PCC's RAM is used for a 16-character 
input buffer. An input buffer of this size allows the ATP 
subsystem a full 16 character times to respond to a software 
interrupt [or eight milliseconds at the 19 H 200 bps line 
speed ) before a data overrun can occur. 

ATP Hardware 

There are three elements in the ATP hardware design. 
The first element is an interface on the intermodule bus. 
This is provided by the system interface board (SIB), Then 
there is logic to resolve the asynchronous contention of 
many PCC's for l he shared facilities. The board that con- 
tains this logic is called the asynchronous interface board 
(AIB)< Last, there are junction panels to provide the terminal 
and modem interfaces (see Fig, 2). 

System Interface Board (SIB) 

Functionally, the SIB is a byte multiplexer channel op- 
timized for the port controller chip (PCC) and terminal I/O. 
On one side, the SIB provides an interface to the inter- 
module bus (1MB), the standard I/O bus for the HP 3000, On 
the other side, it controls the ATP hus t which connects the 
SIB to as many as eight asynchronous interface boards 
[AIBs). This design allows one ATP subsystem to provide 
96 terminal I/O ports while occupying one IMS channel 
address and nine 1MB I/O slots. These are important consid- 
erations for a Series 64 system. 

The SIB provides an interface to the software for control 
of the PCCs, It generates 1MB requests for the PCCs and 
performs the tasks required to manage the asynchronous 
nature of the interface. 

Since the PCCs and the ATP bus are byte- oriented while 
the 1MB is word-oriented (two bytes), the SIB performs the 
byte packing and unpacking to translate from one bus to the 



other. It also allows data transfers to start or stop in either 
the left or right bytes of the two-byte words in system 
memory. 

As the controller of the ATP bus. the SIB polls each AIB 
many times each second to see if one oi the PCCs has data for 
or needs data from system memory or wants to generate a 
software interrupt The SIB generates software interrupts 
upon request of a PCC, The PCC's identity number and a 
byte of status information are presented to software at the 
1MB interface, A first-in, first-out (FIFO] queue is main- 
tained to smooth out peaks in the interrupt workload. To 
prevent interrupt overruns f polling on the ATP bus is sus- 
pended when the FIFO is full. The worst-case maxi- 
mum data rate on the ATP bus is about 150,000 characters 
per second. 

Each PCC is provided with three paths into system mem- 
ory, The SIB implements these paths via the IMB's direct 
memory access (DMA) facility. This implementation allows 
the ATP to handle heavy I/O loads with very low overhead. 
All of the overhead on an ATP I/O transfer is incurred to 
start and stop the transfer. Once the transfer starts, the only 
overhead is one 1MB DMA cycle for each two characters. 

The SIB is designed to be easy to manufacture and sup- 
port. The basic design is conservative, making little use of 
exotic or high-speed parts- The ability to diagnose a failed 
board has been designed into the ROM-based state machine 
that operates the board, 



Asynchronous Interface Board (AIB) 

Each AIB contains an interface to the ATP bus, twelve 
PCC modules, one modem controller chip (MCC) module, 
and the circuits that allow the PCC modules to share the 
ATP bus and the MCC. The AIB also contains the circuits 
that generate the LI ART baud rates and drive the cables to 
the junction panels. 

The PCC and MCC modules and the ATP bus interface are 
attached to a bus on the AIB, Access to this bus is controlled 
by a ROM-based state machine. The state machine arbitrates 



Junction Panel 



Data Communications 




Modem Motherboard 



Ribbon 
Cable 



Direct Connect 
Motherboard - 







Fig. 2, Advanced terminal pro- 
cessor junction panel mint-boards 
carry four direct connection ports 
or two modem ports each. 



24 HEWLfc" I -PACKARD JOURNAL MARCH 19S2 



)Copr. 1949-1998 Hewlett-Packard Co. 



requests for the AIB bus from the PCC modules, the MCC 
module and the ATP bus interface. 

The MCC is another 6801 microcomputer. Its function is 
to allow each PCC to control the signals that operate a 
device connected to that port usually a modem. Each PCC 
may read seven input signals and set eight output signals. 
This capability allows the PCC to control almost all devices 
offering a serial I/O capability. 

The UART on the MCC is used to multiplex the 180 
device control signals between the MCC and the junction 
panel. Use of the UART as a multiplexer eliminates the need 
for external multiplexer logic or a very large and expensive 
cable. 

The AIB is designed to take advantage of the PCCs and the 
MCC to diagnose much of its own logic. Extensive use is 
made of the 6801 microcomputers to loop data from the 
in memory through the AlB's logic and back to system 
memory. This technique provides good troubleshooting 
facilities without adding special logic: tar testing. 



Junction Panels 

The ATP provides two types of junction panels. Direct 
connect junction panels are used to connect terminals in the 
local area. Modem junction panels provide the control sig- 
nals required to operate full-duplex modems, allowing re- 
motely located terminals to be connected to the IIP 3000 via 
the public telephone system. 

The ATP design calls for a high-speed, long-distance 
direct connection facility. This facility is implemented 
using EI A standard RS-422. The standard allows operation 
at a line speed of 100,000 bps at 4000 feet (1200 meters), 
However, since most terminals now available offer only 
R3-232-C interfaces, the ATP junction panels allow both 
types of interfaces, and interfaces can be easily mixed. 

The ATP direct connection junction panels are im- 
plemented as small boards that plug into a motherboard, 
Each mini-hoard carries four ports, either KS-422 or RS- 
232-Q 

To allow the maximum number of ports in the least pos* 
sible space, a new more compact connector was designed to 
replace the 2r>-pin D subminiature connector in common 
use. The new connector uses only three wires for RS-232-C 
and five wires for RS-422. It is fully shielded and carefully 
grounded to insure that no radio frequency interference 
[RF!) is generated and that the system has the minimum 
possible susceptibility to external electrical influences. 

The modem junction panels can also support a mixture of 
two interface standards, The modem junction panel uses 
the same mini-board concept as the direct connection 
panels, except thai the modem mini-boards can carry only 
two ports per board because of the large size of the standard 
modem connectors. 

On the modem junction panel, another 6801 demulti- 
pleXBS the signals from the modem controller chip. This 
microcomputer is called the modem scanner chip. The port 
controller chip notifies the modem scanner chip (through 
the MCC) which input signals to examine, the expected 
state of these signals, and how to set the output latches. 

The modem scanner chip scans the modem input signals, 
notices and debounces any changes and reports the changes 
iu the MCC, which passes rhem on to the PCC 



ATP Software 

The principal objective of the ATP software design is to 
provide a structure that will be easy to support and en* 
hance. Most of the costs associated with a software product 
are incurred to fix problems in the original implementation 
and to add new features after the initial release. The ATP 
software was designed with a modular struct ore to facilitate 
this process. 

The most desirable enhancement to HP 3000 terminal I/O 
has been a more flexible terminal-type facility- To achieve 
this goal in the ATP. the control of a port's characteristics is 
table* driven- Each of the current terminal types corre- 
sponds to one table. Changing one of these tables or adding 
a new lable is a simple task. 

In addition, the ATP software Is designed to be fail-safe. 
When the MPE software detects a serious problem, all pro- 
cessing is stopped and a system failure is reported. When 
the ATP software detects a problem that affects only one 
port, only that port's operations are stopped. All other pro- 
cessing continues. An on-line facility allows the system 
manager or the HP customer engineer to record the state of 
that port for later analysis. The on-line facility also allows 
the port to be reset without requiring the system to be 
warm-started. 

The ATP does not support some obsolete features of the 
earlier controllers, including half-duplex modems, tape 
tuode, and several obsolete HP terminals, 
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GUEST — A Signature Analysis Based Test 
System for ECL Logic 

by Edward R. Holland and James L Robertson 



TESTING OF LOADED PRINTED CIRCUIT BOARD 
assemblies is vital in the production of any comput- 
er system, Kor effective testing of HP 3000 Series 64 
boards, a special tester was designed. This system, the 
Gemini Universal ECL Signature Test system, or GUEST. 
is able to capitalize on the HP 5005 Signature Multimeter 1 
and the built-in shift string organization of the flip-flops , 
on the hoards under test (see article, page 11). 

Since all ECL networks in the Series 64 must be termi- 
nated in their characteristic impedances to suppress reflec- 
tions, a tester running at slow clock speeds, as most board 
testers do, would not detect termination ajid other timing 
problems. A tester that functions at real-time clock rates 
was required and the GUEST system meets that need, being 
able to operate at clock rates up to 25 MHz. 

In board testing, a test vector is a set of input states 
applied by a tester to a unit being tested. Generation of test 
vectors for a given unit under test (UUT) is often a difficult 
and time-consuming operation. In the GUEST system, vec- 
tors are generated algorithmically in real time by hardware. 
Therefore, preparation of test programs is reduced from 
weeks or months to a few hours. This has made the GUEST 
test system a useful tool throughout the prototype design 
process of the Series 64 . By contrast, most test systems are 
usable only after the design is complete and all the test 
vectors have been generated, a point that is reached typi- 
cally very late in a project. 

Computer-Driven Test System 

The GUEST test system is interfaced to the controlling HP 



1000 Computer though the HP-IB [IEEE 488) interface bus 
(see Fig, 1). The primary functions of the computer system 
are to initiate testing and to guide the operator in probing of 
a defective UUT. One computer is able to handle up to six 
GUEST test stations, or a combination of GUEST test sta- 
tions and DTS-70- test stations with all stations active at the 
same time, 

The GUEST test system is interfaced to the UUT by a 
personality board as shown in Fig. 2. Building the personal- 
ity board for ECL boards requires simply loading wire 
jumpers ana universal board. There is one set ol jumpers for 
UUT input pins and another set of jumpers for pins that 
require terminations. 

Hardware Test Vector Generation 

The test vectors used in the GUEST system are generated 
as a serial bit stream by a 13-bit linear feedback shift regis- 
ter t which produces an 8191-bit-long pseudorandom se- 
quence (see Fig, 2). This bit stream is shifted though a 
58 8- stage shift register on the GUEST board and then via the 
personality board through the shift string of the UUT. Shift- 
ing continues for 1023 clock cycles. Then on the 1024th 
clock cycle, all of the shift register elements on both the 
GUEST board and the UUT are parallel -loaded under con- 
trol ot the diagnostic control unit shift line- The next 1023 
clock cycles then cause the data loaded in the parallel 
operation to be shifted out of the combined registers past 
the GO/NOGO point. The test runs for a total of 8,379,393 
(8101 x 1023) shifting clock cycles so that each of the inputs 
to the board is driven by all of the HI 91 bits of the 
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pseudorandom sequence and each output of even shift 
string flip-flop on the board Ss also driven by all 8191 bits, 

Go/No-Go Testing 

As the data stored in the shift registers on the 1024th 
clock cycle is shifted out past the GO.'NGGO test point, the 
HP 5005 Signature Multimeter compuhs the ^gnature of the 
data stream and the computer checks this signature against 
the expected signature stored for the board type being 
tested, Since both the (HJEST and the L'lIT shift registers 
are shifted past this test point, any incorrect value from any 
part lit tin ,uses an income I G4 l/NOGQ signal u re li 

the board passes the GO/NOGOtesI it is ready for tastallatii m 
in a system. If the board fails this test, additional testing is 
done on the GUEST tester. 

The control logic of the GUEST tester can limit the signa- 
ture analyzer to only the last N data outputs of the shift 
string, where M is specified by the computer. When N is the 
maximum value, the signature analyzer clock is always 
enabled and the signature depends on all bits in the shifted 
outputs, As N is decreased, fewer and fewer outputs enter 
into the signature. At some value N^n the signature will Lie 
while at N=n-1 the signature will be found to 
equal the expected value. Since there is a one- for-one map- 
ping of trie value of N and a point in the overall shift strings 
it is possible to pinpoint the fault loan I Opinoran input to 
a shift string bit on the I fj T. The computer is programmed 
to use a binary Bearch algorithm and takes ten signatures or 
less to find the correct poinl in the shift string to begin 
further back tracing. 



Fig, 2. Test vectors are generated 

aigonthmscatiy by GUEST and 
loaded mto the shift string of the 
HP 3000 Series 64 board being 
tested. The data is then shifted 
past the GOiNGGO point, where the 
5005 Signature Multimeter gener- 
ates a signature that tells whether 
or not a problem exists on the 
board. 



Backtracing 

While backtracing faults to the component level, the sig- 
nature at each node is dependent only on combinational 
functions of the pseudorandom lesl vectors applied to the 
UUT input pins and shifted into the internal shift register 
states of the I ■ I T. The signatures are measured by moving 
the signature analyzer probe to each node as guided by the 
computer. 

Backtracing starts si the node determined by the GO 
MQGO binary search, When a node is found that has an 
irrecf signature but all inputs affecting that node have 
correct signatures, the faulty node has been isolated, 

Feedback Loops 

To make effective use of signature analysis* all feedback 
loops should be broken, It a feedback loop is not broken, a 
bad signature anywhere in the loop causes all other nodes 
in the loop to have bad signatures, and fault isolation to the 
component Level is impossible. Feedback loops are btt 
duri ng backtrac ing by the internal U LIT sh Ift r i tg inters Th is 
is accomplished by clocking the signature analyzer when 
the outputs of each shift register are dependent only on Its 
serial input and not on its parallel data inputs. Since bad 
signatures cannot propagate through these shirt registers, 
any feedback path from an output to a parallel data input is 
broken, 

Clock and Shift Control Faults 

Another complication in the l.nh isolation process is thai 
oi diagnosing faults in the clot k and shift control urcis 
the UUT Si nee data in these areas is normally synchronous 
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Designing for Testability with GUEST 

by Karen L Meinert 



The HP 3000 Series 64 and the GUEST system were designed 
at the same time to work together it was discovered that certain 
design issues had to be considered when designing a Series 64 
printed circuit assembly (PCA) if maximum GUEST testability was 
to be achieved. 

High Fan-in 

Fan-in is defined as the number of inputs to a circuit that is 
needed to produce an output, Because GUEST outputs only 81 91 
predefined test vectors , a fan-in of 1 4 or g reater makes it im possi- 
ble to sequence through all combinations (2 14 is greater than 
8191) If the fan-in is less !han 10 it is sale to assume that all 
combinations will appear in the test sequence. Modifications to 
the board or the personality board can be made to assure tnat all 
combinations are fried when the fan-in cs between 11 and 13, 
There are also several signals (mosily high, mostly low, pulse 
high, pulse low) available on the personality board; these can be 
used to drive inputs to increase a circuit's effective test vectors. 

It is important that the 1 3 inputs to a h^gh- fan-in circuit be driven 
by contiguous bits from GUEST. The bits in the shift string and on 
the edge connectors of GUEST are jus* snifted versions of the 1 3 
bits that guarantee the 8191 test vectors. If 1 3 bits are selected at 
random, there is no guarantee that all 8191 combinations will 
appear 

Feedback Loops and Shift Strings 

For a PCA to be GUEST-testable it is not necessary that it have 
shift strings (see article, page 1 1 ). The shift strings are an effective 
means to break up feedback loops and initialize memory ele- 
ments. If no feedback ioops exist on a board and memory can be 
initialized, there is no need for shift strings 

Nonstandard Clocking 

Both ihe GUEST shift register and the signature analyzer are 
csocked on the standard system clock. If other phases or edges of 
the clock are used, testing problems may occur. Feedback loops 
are not necessarily broken if shift string registers are clocked on 
phases other than that used by GUEST Also, there could be 
problems in getting information between jhe PCA and the GUEST 
system in time if other phases are used In general, ciocks other 
than the standard system clock should be avoided when design- 
ing PCAs to be tested by GUEST, 

Non-ECL Circuitry 

GUEST was designed to be an ECL circuit tester. To test circuits 
other than ECL, translator packs are used on special personality 
boards that adapt each tested board iq GUEST, In addition, 
three- state buses should not oe avowed to float to the I : y ■ 
impedance state or unstable signatures will result. The soLton is 



to nave GUEST drive the bus whenever the board is not. This is 
accomplished by bringing the three-state bus enable signal to an 
unused I/O pm on the PCA. This signal is inverted and used on the 
personality board to enable the translator direction (ECL to TTL or 
vice versa). 

TTL signals can be probed by the HP 5005 Signature Multi- 
meter, since this analyzer is capable of changing its input voltage 
threshold to allow for non-ECL signals. This can be done either 
manually or under automatic control through the HP-IB interface. 

RAMs 

To test a RAM completely each location must be addressed five 
times, once each to write one, write zero, read one, read zero, and 
deselect. Since GUEST outputs a total of only 8191 test vectors, 
RAMs larger than 1 024 bits deep cannot be tested completely. To 
obtain known signatures on the outputs of RAMs, any location that 
is read must first have been written. If a location is read that has 
not been written, the read data is whatever was in the RAM at 
power-up. This is random data and therefore has a random signa- 
ture. A signal called RAMINIT (RAM iNUiahze) is generated by 
GUEST to assure that all read locations are initialized. GUEST 
generates a pseudorandom number sequence with a length 
equal to half of a signature analyzer cycle. This sequence is 
repeated during the second half of the cycle, raminit is high for 
the first naif-cycle and low for the second ha^cycle during the 
time that the RAMs can be read or written into, Raminit is used >n 
the write enable circuitry to force a write when high and to enable 
reads and writes when low. In this way every location that is 
addressed in the first half is written with known data. When these 
addresses repeat themselves in the second half (they are part of 
the repeated pseudorandom sequence), where reads may occur 
the locations have been initialized, and known, repeatable signa- 
tures will occur. 
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with the signature analyzer clock, only SAO and SA1 [stuck 
at and stuck at 1) signatures would be found and normal 
backtracing would be impossible. To isolate this type of 
fault, the first backtrace signature measured after the GO 
NGGO test fails is at the GQNOGO test point. A correct 
signature here indicates that the UUT clock and shift con- 
trol circuits are functioning, If the test fails, the shift register 
serial outputs are probed using a binary search algorithm 
until a good signature is found and the following device's 
serial output is bad. Clock and shift control inputs to this 
earliest failing shift register are probed while pseudoran- 



dom data is applied to the clock and shift inputs of the UUT. 
If no bad inputs are found, the shift register is suspected to 
be bad. If a clock or shift control input fails, backtracing 
starts at that point, with pseudorandom data still applied to 
the clock and shift inputs of the UUT. and continues until 
the failing node is isolated. 



Test Report 

When the faulty node has been isolated, three additional 
measurements are made on the suspected node. They are: + 
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peak voltage, - peak voltage, and resistance to ground. A 
test report with information pertinent to the failing node is 
then printed. The failure descriptions can be? NODE MOVES. 
NODE STUCK. NODE UNSTABLE, or OPEN TRACE, The infor- 
m on the test report can be used to further diagnose the 
fault (shorted nodes, open terminations, etc 

Two assumptions made so far are that the probe has made 
good contact with the node being tested, and that the 
operator probes the correct point when prompted. This is 
not always the case, so provisions have been made for 
recovery from probing contact errors To minimize operator 
misprobes, ail pertinent IC pins are probed in sorted as- 
cending order before moving on to the next IC. Thus, pin- 
to-pin and IC-to-iC movement is minimized, Each node is 
probed at least twice in different locations to diagnose mis- 
probes or open traces. It a misprobe is discovered before it is 
diagnosed by the computer, the operator can ask to reprobe 
the point where the error was made. 

Edge connector pins, resistor pins, or any other point that 
is difficult to locate or probe can be declared inaccessible. 
During backtrace, the operator will not be prompted to 
probe these points. When the test report is printed, any 
inaccessible point that could have caused the fault is listed 
as not checked. 

Creating the Test File 

Before using GUEST* a UUT test tile must be created by 
TEST AID, 3 To use TESTA ID. topology information { 
scription of the types and interconnections oi all integrated 
circuits on the UUT) must be supplied by the test program- 
mer This data is available from the computer routing and 
placement program used in the design of Series 64 boards, 
and is converted by a utility program to the format required 
by TESTAID. No other input data is required for TESTA ID 
simulation because all test vectors are generated by the 
GUEST hardware. 

After the preliminary test file has been created by TEST- 
AID, UUT-de pen dent test setup parameters need to be 
added (clock period, reference voltages, special node defin- 
itions, etc. I. Signatures can then be added as measured from 
a known good t VT. The addition of all GQ/NQGO signatures 
is automatic and requires approximately two hours run 
time. Backtrace signatures are added by manually probing 
each internal node as guided sequentially by the computer. 
This process also requires approximately two hours, 

Once generated, the UUT test file should be verified for 
acy of fault isolation and fault coverage. Fault isola- 
tion accuracy can be verified by inserting a few known 
faults in critical areas of the UUT and determining whether 
they are diagnosed correctly. Fault coverage is verified by 
measuring the GGNOGO signature repeatedly while prob- 
ing internal nodes, as guided by the computer, with a special 
probe that forces each node both high and low II th« i uj 
\Tji r o test passes, the stuck at I or stuck at node fault is 
marked as not detected. When probing is complete, the fault 
detection percentage is computed and saved in the UUT test 
file. Typical fault coverage is in (he order of 99% for ECL 
boards. 

The final step to he dune w T hen creating ■■ test file is 
documentation. By entering the DOt>\LL command, com- 
plete documentation ol the It FT test file (all node drivers, 



receivers, terminations, special setup, signatures* etc.) can 
be listed on the line printer. 
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Packaging the HP 3000 Series 64 



by Manmohan Kohli and Bennie E. Helmso 



THE HP 3O0C SERIES ei4 is the new highcst-perfor- 
marice member of the HP 3000 Computer family. It 
has been designed to retain certain visual aspects 
of current HP systems and peripherals for visual compati- 
bility, but also incorporates new directions for future prod- 
ucts. The Series 64 is intended to be used in an EDP envi- 
ronment and a cost-effective package design has been 
applied to optimize reliability and serviceability. 

New tubular welded frames, preassembled formed 
cardcages, bus bars and modular junction panels were de- 
veloped as part of this cost-effective package. \ 7 ew two- 
piece press-fit connectors for printed circuit boards and the 
backplane and an effective cooling scheme were developed 
for higher reliability. 

The cabinet has two open frames that are bolted together 
in the front and rear. These provide complete access to 
various assemblies including backplanes for servicing (Fig, 
1). A 36-slot card cage contains the CPU, memory and IO 
adapter cards and a second 24-slot card cage contains 10 
device cards. The power system is designed to support the 
maximum configuration. 

Separate cooling systems are used to cool printed circuit 
boards and power supplies. Cooling air is drawn from the 
rear and is then forced through the card cages for proper 
cooling of the printed circuit boards. Bus bars are used 
between backplanes and power supplies to minimize cable 
harnesses and to facilitate replacement of power supplies. 
I/O junction panels are modular and provide maximum 
flexibility for various system configurations. 

The system cabinet is designed to meet HP Class C en- 
vironmental specifications. This requires the system to op- 
erate at extreme temperature and humidity, between sea 
level and high altitude 1 and exposed to shock and vibration. 



Cooling System 

The cooling system is designed to meet two major objec- 
tives under worst-case conditions: 

■ Transistor junction temperatures not to exceed BEV^C for 
higher reliability 

■ Air temperature rise not to exceed 10 D C for adequate 
noise margin. 

The CPU boards contain mostly BCL chips with an aver- 
age heat dissipation of 0,5 watt/chip and 45 watts/board. 
However, the total heat dissipation from the maximum- 
configured CPU, memory, and 10 adapter is 1300 watts. 
These parameters presented some challenges in designing 
the cooling system, To meet the above objectives, five 16,5- 
cm-diameter lubeaxiai fans arc used to deliver high- 
velocity air at 2,3 m/$ for cooling the ECL chips. 

Cardeages are cooled by taking air in from ihe rear and 
then forcing it up through Ihe cardcage. A plenum is used 
between the lans and the cardcage to help distribute air 
across the cardcage more evenly. The fans are inside the 
plenum, which is lined with acoustic loam to reduce noise. 

T w o s et s of o vert em p era t u re s w i tc h e s ar e m o u n t e d on t h e 
two cardcages. When the temperature reaches the low set 
point (40°C), both audio and visual alerts to the operator are 
initiated, The system is shut down if the cardcage tempera- 
ture reaches the high set point (50°C). 

Electromagnetic Compatibility Design 

To decrease emission of electromagnetic energy and sus- 
ceptibility to it, the logic ground is isolated from the frame 
ground, This is achieved by electrically isolating all printed 
circuit boards and backplanes from the sheet-metal 
cardcages. Radiated emissions are minimized by providing 
and connecting ground planes in the printed circuit boards 
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Fkj. 2. Typical HP 3000 Series 64 printed circuit board tX3fc 
struction, Dimensions are in inches. 



and backplanes. Susceptibility is minimized by grounding 
the frames, exterior panels, and internal assemblies to each 
other through plated interfaces, Beryllium-copper clips 
mounted on all exterior panels make contact with the plated 
frame members to provide grounding to Teduce susceptibil- 
ity lo electrostatic discharge. Three isolation transformers, 
one for each phase, provide good common- mode noise re- 
jection* lessen susceptibility to electromagnetic: interfer- 
ence, and insure that ac line leakage current is well within 
safety requirements. 

Boards and Connections 

Designing and making large, high-density, controlled- 
impedance-interconnect printed circuit boards presented 
gome new challenges. The design parameters that were 
traded off to arrive at final design ful$fl were board size. 
minimum trace width and spacing, IC package density, 
trace mutability, and raw board testability, What resulted 
were boards approximately 36 era square with an average of 
120 IC packages per board. Minimum trace width and spac- 
ing was set at G.18 mm (0.007 in), A 2.5-inm (0. l(K)-in) grid 
system for component holes was employed to facilitate 
testing of unloaded boards before assembly, 

For density and mutability considerations only two sig- 
nal layers are used. These are on the outer layers, leaving 
the inner layers for power and ground distribution. The 
typical board construction is shown in Fig, 2. It should be 
noted that the signal lines are in the standard microstrip 
configuration with nominal characteristic impedance of 
approximately 68 ohms. 

For the interconnect design of the large number of indi- 
vidual circuits, manual design was ruled uut t largely be- 
cause of time constraints. A sophisticated design automa- 
tion approach was selected to provide automated place- 
ment of components and routing of traces after initial data 
entry of schematic and mechanical information. 

Interconnecting all the boards in a system this large also 
presented some special pro hi ems- Board-to-board inter- 



connection in the CPU and memory' sections requires just 
under 10.000 pins, A pin-and-socket (two-piece) connector 
system was selected. It was felt that the conventional card- 
edge (one-piece) system had some problems (such as board 
edge contamination from soldering and handling, gold 
plating cost, thickness and quality control) which the two- 
piece system minimizes. For example, all connector plating 
is done by the connector manufacturer, and ail gold plating 
is eliminated from the printed circuit boards. 

The board-to-board interconnection is accomplished by 
means of a backplane assembly. In addition to providing for 
the interconnection of signal lines, the backplane also dis- 
tributes dc power to the individual circuit boards. The 
construction details of the backplane areas shown in Fig. 3. 
There are four signal layers, two ground planes, and four 
power planes. The outer layers are microstrip transmission 
lines and the inner signal layers (layer 3 and 8) are conven- 
tional striplines. The power and ground planes axe du- 
plicated to aid in power distribution. Because of the number 
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Fig. 3. HP 3000 Series 64 backplane construction Dimen- 
sions are tn inches. 
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Fig. 4. Board-to-backpiane connection method is gas-tight 
and eliminates soldering. 

of layers and the large amount of copper in the inner planes, 
conventional connectors soldered Into the backplane were 
not practical. The problems of thermal stresses to the 
backplane assembly during soldering and inability to re- 
move or replace connectors because of the heat-sink effect 
of the inner planes led to the selection of press- fit connec- 
tions between the connectors and the backplane. An inter- 
ference fit between the connector pins and holes in the 
backplane produces a gas-tight connection and eliminates 
the need lor soldering the pins to the backplane. The board- 
to-backplane connection is shown in Fig. 4. 

Acknowledgments 

The development of the new cabinet and packaging re- 
quired contributions from many people. Contributions 



PRODUCT INFORMATION 

HP 3000 Series 64 Computer System 

MANUFACTURING DIVISION: 
Computer Systems Divtsion 
19447 P run Bridge Avenue 
Cupertino. California 95014 U.S.A. 
SPECIFICATIONS: HP Publrcalion No 5953-DB5S 
PRICES IN U.S.A.: 32460A HP 3Q0D Series 64 
System Processing Unit, §164,700 301 43 A I/O 
Adapter Module. S1 0,000. 30142A 1M-Byte 
Memory Module. S IS. 000. Disc Drives, terminals, 
and other peripherals additional 





were made by Bob Cook on packaging and serviceability. 
Steve Spelman on industrial design and plastic parts. 
George Canfield on power distribution, and Katie Torres for 
most of the drawings and documentation. Jim Brannan w r as 
responsible for most of the product qualification testing, 
Barb Gee provided the manufacturing support and Toby 
Huff the service engineering support, Guidance and support 
was provided by Peter Rosenbladt R&D section manager. 



Manrnohan Kohli 

Manny Kohl* is a native of Punjab, India 
and received the BSME degree from 
Punjab University in 1964. He earned 
the MSME degree at the University o! 
California at Berkeley in 1967 and then 
did mechanical desjgn of impact print- 
ers and high-speed mechanisms and 
stress analysis ot aircraft interiors be- 
fore joining HP in 1974. Manny de- 
signed me cabinet for the HP 3000 
Series 33/44, worked on the thermal 
printer used in the HP-91 and HP-97 
Calculators, and is project manager tor 
the industrial/product design of the HP 
3000 Series 64, Manny enjoys working 
on home projects, tinkering with cars, camping, and reading. He is 
married, has two children, and lives m San Jose, Calrfomia 

Bennte E. HeJmso 

Ben HeJmso joined HP in 1 960 after re- 
ceiving tne BSEE degree from the Uni- 
versity of California at Berkeley. During 
his more than twenty years at HP he has 
worked on oscilloscope, microwave 
sweeper, and wnstwatch product de- 
signs and managed manufacturing en- 
gineering and quality assurance 
groups. At present, Ben is doing RF 
srgnal generator product design. He is 
named inventor on a patent related to 
the HP-01 watch band. Ben was born in 
Los Angeles, California and served two 
years sn the US. Navy before studies for 
his BSEE degree. He <s married, has 

two children, and lives in Spokane, Washington. Hs interests include 

camping, boating, hunting, and fishing. 





Hewlett-Packard Company, 3000 Hanover 
Street. Palo Alto, California 94304 



HEWLETTPACKARD JOURNAL 



MARCH 1982 Volume 33 • Number 3 
Technical Information from the Laboratories of 

Hewlett-Packard Company 

Hewlett-Packard Company 3000 Hanover Street 

Pato Alto. California 94304 U.5 A 

Hewlett-Packard Central Mailing Department 

Van Heaven Gaednartlaan 121 

1181 KK AmsieJvaen. The Netherlands 

Yofcogawa- Hewlett-Packard Ltd Sugmarru-Ku Tokyo H&fl Japan 

Hewlett- Packard ( Canada i Ltd 

6877 Gcreway Drive Mississauqa Ontario LJV 1MB Canada 



CHANGEOF ADDRESS: 



Bulk Rate 

U.S. Postage 

Paid 

Hewlett-Packard 

Company 



<•<> 0094035& EtHANS tQMOO 

5Xs8 !*h n If Search center 

FPD D1V 

CODE BLDGN 244-7 

MOFFETT FIELD CA 94035 



;- yout address or delete your name- ifom a u r ma&ng -as label Send 

en sn ge s iq Hewlett -Packard JoyrnaJ 3000 Harv jJg Alio CafifQfnia 94304 USA All&w 60 days 



)Copr. 1949-1998 Hewlett-Packard Co. 



